Polynomial Systems and some Applications to Statistics

4 downloads 0 Views 323KB Size Report
Polynomial Systems and some Applications to Statistics. Lorenzo ... We let K be a field, n ≥ 1, P = K[x1,...,xn] a polynomial ring, r ≥ 1, σ a module term ...... Gröbner basis G of the ideal I(X), and the list of power products O = Oσ(I(X)) whose.
Polynomial Systems and some Applications to Statistics Lorenzo Robbiano



Abstract. In the first part of my lectures we study systems of polynomial equations from the point of view of Gr¨obner bases. In the second part we introduce problems from Design of Experiments, a branch of Statistics, and show how to use Computational Commutative Algebra to solve some of them. Information about the computer algebra system CoCoA can be found at the URL http://cocoa.dima.unige.it/ The fundamental material needed as a prerequisite is contained in [KR00].

Contents 1 PRELIMINARIES 1.1 Gr¨obner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Nullstellensatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 6

2 SOLVING POLYNOMIAL SYSTEMS 7 2.1 Zero-Dimensional Ideals and Gr¨obner Bases . . . . . . . . . . . . . . . . . . 7 2.2 Points and Buchberger M¨oller-Algorithm . . . . . . . . . . . . . . . . . . . . 12 3 APPLICATIONS TO STATISTICS 16 3.1 Identifying the Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Two Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29



Dipartimento di Matematica, Universit` a di Genova, Italy,

E-mail: [email protected]

1

PRELIMINARIES

In this chapter we review on Gr¨obner Bases. In particular we recall the notion of the Reduced Gr¨obner basis of an ideal or module, with an application to the notion of Field of Definition. Then we recall the fundamental Hilbert’s Nullstellensatz. The material outlined here is fully covered in the book [KR00].

1.1

Gr¨ obner Bases

We let K be a field, n ≥ 1, P = K[x1 , . . . , xn ] a polynomial ring, r ≥ 1, σ a module term ordering on P r . Theorem 1.1 (Characterization of Gr¨ obner Bases) For a set of elements G = {g1 , . . . , gs } ⊆ P r \ {0} which generates a submodule M = G

hg1 , . . . , gs i ⊆ P r , let −→ be the rewrite rule defined by G, let G be the tuple (g1 , . . . , gs ), let λ be the map λ : P s −→ P r defined by εi 7→ gi , and let Λ : P s −→ P r be the map defined by εi 7→ LMσ (gi ). Then the following conditions are equivalent. A1 ) For every m ∈ M \ {0}, there is a representation m = f1 g1 + · · · + fs gs with f1 , . . . , fs ∈ P such that LTσ (m) P ≥σ LTσ (fi gi ) for i ∈ {1, . . . , s} with fi gi 6= 0, i.e. such that LTσ (m) ≥σ degσ,G ( si=1 fi εi ). A2 ) For every m ∈ M \ {0}, there is a representation m = f1 g1 + · · · + fs gs with f1 , . . . , fs ∈ P such that LTσ (m) P = maxσ {LTσ (fi gi ) | i ∈ {1, . . . , s}, fi gi 6= 0}, i.e. such that LTσ (m) = degσ,G ( si=1 fi εi ). B1 ) The set {LTσ (g1 ), . . . , LTσ (gs )} generates the Tn -monomodule LTσ {M }. B2 ) The set {LTσ (g1 ), . . . , LTσ (gs )} generates the P -module LTσ (M ). G

C1 ) For an element m ∈ P r, we have m −→ 0 if and only if m ∈ M . G

C2 ) If m ∈ M is irreducible with respect to −→, then we have m = 0. G

C3 ) For every element m1 ∈ P r, there is a unique element m2 ∈ P r such that m1 −→ m2 G

and m2 is irreducible with respect to −→. G

G

C4 ) If m1 , m2 , m3 ∈ P r satisfy m1 −→ m2 and m1 −→ m3 , then there exists an element G

G

m4 ∈ P r such that m2 −→ m4 and m3 −→ m4 . D1 ) Every homogeneous element of Syz(LMσ (G)) has a lifting in Syz(G). D2 ) There exists a homogeneous system of generators of Syz(LMσ (G)) consisting entirely of elements which have a lifting in Syz(G). D3 ) There exists a finite homogeneous system of generators of Syz(LMσ (G)) consisting entirely of elements which have a lifting in Syz(G).

Definition 1.2 Let G = {g1 , . . . , gs } ⊆ P r \ {0} be a set of elements which generates a submodule M = hg1 , . . . , gs i ⊆ P r . If the conditions of Theorem 1.1 are satisfied, then G is called a Gr¨ obner basis of M with respect to σ or a σ-Gr¨ obner basis of M . In the case M = h0i, we shall say that G = ∅ is a σ-Gr¨ obner basis of M . Corollary 1.3 (Macaulay’s Basis Theorem) Let M ⊆ P r be a P -submodule with σ-Gr¨ obner basis G = {g1 , . . . , gs } ⊆ P r \ {0}, and let B denote the set of all terms in Tn he1 , . . . , er i which are not a multiple of any of the terms in the set {LTσ (g1 ), . . . , LTσ (gs )}. Then the residue classes of the elements of B form a K-basis of P r /M . Proposition 1.4 Let M ⊆ P r and let G = {g1 , . . . , gs } ⊆ P r \ {0} be a σ-Gr¨ obner basis of M ⊆ P r . Let m ∈ P r and let mG be the unique irreducible element such that G m −→ mG . Then mG is the unique element with the properties that m − mG ∈ M and Supp(mG ) ∩ LTσ {M } = ∅. In particular it does not depend on the particular σ-Gr¨ obner basis chosen. Definition 1.5 Let M ⊆ P r be a a non-zero P -submodule, and let m ∈ P r . The element mG ∈ P r described in Proposition 1.4 is called the normal form of m with respect to σ. It is denoted by NFσ,M (m), or simply by NFσ (m) if it is clear which submodule is considered. Definition 1.6 Let G = {g1 , . . . , gs } ⊆ P r \ {0}, and let M = hg1 , . . . , gs i ⊆ P r . We say that G is a reduced σ-Gr¨ obner basis of M , if the following conditions are satisfied. a) For i = 1, . . . , s we have LCσ (gi ) = 1. b) The set {LTσ (g1 ), . . . , LTσ (gs )} is a minimal system of generators of LTσ (M ). c) For i = 1, . . . , s we have Supp(gi − LTσ (gi )) ∩ LTσ {M } = ∅. Theorem 1.7 (Existence and Uniqueness of the Reduced σ-Gr¨ obner Basis) For every P -submodule M ⊆ P r there exists a unique reduced σ-Gr¨ obner basis. As an application of the existence and uniqueness of reduced σ-Gr¨obner bases we can show the existence and uniqueness of a field of definition for submodules of P r . Definition 1.8 Let K be a field, P = K[x1 , . . . , xn ] a polynomial ring, and let M ⊆ P r be a P -submodule. a) Let k ⊆ K be a subfield. We say that M is defined over k, if there exist elements in K[x1 , . . . , xn ]r which generate M as a P -module. b) A subfield k ⊆ K is called a field of definition of M , if M is defined over k and there exists no proper subfield k 0 ⊂ k such that M is defined over k 0 .

Example 1.9 Let I ⊆ C[x1 , x2 , x3 ] be the ideal generated by √ √ √ √ {x21 − 5x1 x2 + 3x1 x3 + 2 5x23 , x1 x2 − 2x23 , 2x1 x2 + 3x23 } √ √ √ Obviously, I is defined over Q[ 2, 3, 5]. But it is also easy to check that the ideal I is generated by {x21 + 3x1 x3 , x1 x2 , x23 }. Therefore, I is defined over the prime field Q of C, and the unique field of definition of I is Q.

Theorem 1.10 (Existence and Uniqueness of the Field of Definition) Let M be a non-zero P -submodule of P r . a) There exists a unique field of definition of M . b) Given any module term ordering σ, let G be the corresponding reduced σ-Gr¨ obner basis of M . Then the field of definition of M is the field generated over the prime field of K by the coefficients of the terms in the support of the vectors in G. Theorem 1.11 (Buchberger’s Algorithm) Let G = (g1 , . . . , gs ) ∈ (P r )s be a tuple of non-zero elements which generate a submodule M = hg1 , . . . , gs i ⊆ P r. For i = 1, . . . , s, we write LMσ (gi ) = ci ti eγi with ci ∈ K \ {0}, ti ∈ Tn , and γi ∈ {1, . . . , r}. Consider the following sequence of instructions. 1) Let s0 = s and B = B = {(i, j) | 1 ≤ i < j ≤ s0 , γi = γj }. 2) If B = ∅, return the result G. Otherwise, choose a pair (i, j) ∈ B and delete it from B. t

j 3) Compute Sij = ci gcd(t gi − i ,tj ) continue with step 2).

ti cj gcd(ti ,tj )

gj and NRσ,G (Sij ). If NRσ,G (Sij ) = 0,

4) Increase s0 by one. Append gs0 = NRσ,G (Sij ) to G and {(i, s0 ) | 1 ≤ i < s0 , γi = γs0 } to B. Then continue with step 2). This is an algorithm, i.e. it stops after finitely many steps. It returns a tuple G of vectors which form a σ-Gr¨ obner basis of M . The following is a very informal statement. Corollary 1.12 Syzygies By keeping track of the reductions to zero in Buchberger’s Algorithm, one can compute a generating set of Syz(M ). If M is a graded P -module one can extend the method and compute a free resolution of M . Proposition 1.13 Let K ⊆ L be a field extension and I an ideal of K[x1 , . . . , xn ]. Then IL[x1 , . . . , xn ] ∩ K[x1 , . . . , xn ] = I In particular, we have IL[x1 , . . . , xn ] = L[x1 , . . . , xn ] if and only if I = K[x1 , . . . , xn ].

Proof. Obviously we only need to prove that the left-hand side is contained in I. We choose a term ordering σ on Tn and let G = {g1 , . . . , gs } be a σ-Gr¨obner basis of I. We see that the set G is also a σ-Gr¨obner basis of the ideal IL[x1 , . . . , xn ]. Now let f ∈ IL[x1 , . . . , xn ] ∩ K[x1 , . . . , xn ]. If we compute the normal form NFσ (f ) using the Division Algorithm, we only perform operations inside K[x1 , . . . , xn ], and therefore f −NFσ (f ) is in the ideal generated in K[x1 , . . . , xn ] by the set of polynomials {g1 , . . . , gs }, which is I. But f ∈ IL[x1 , . . . , xn ] implies NFσ (f ) = 0, hence we have f ∈ I. 

1.2

Nullstellensatz

Definition 1.14 Let K ⊆ L be a field extension, let K be the algebraic closure of K, and let P = K[x1 , . . . , xn ]. a) An element (a1 , . . . , an ) ∈ Ln (which we shall also call a point of Ln ) is said to be a zero of a polynomial f ∈ P in Ln , if f (a1 , . . . , an ) = 0, i.e. if the evaluation of f at the point (a1 , . . . , an ) is zero. The set of all zeros of f in Ln will be denoted by n ZL (f ). If we simply say that (a1 , . . . , an ) is a zero of f , we mean (a1 , . . . , an ) ∈ K and f (a1 , . . . , an ) = 0. b) For an ideal I ⊆ P , the set of zeros of I in Ln is defined as ZL (I) = {(a1 , . . . , an ) ∈ Ln | f (a1 , . . . , an ) = 0 for all f ∈ I} n

Again we call the set of zeros of I in K simply the set of zeros of I and denote it by Z(I). The set Z(I) is also called the affine variety defined by I. Theorem 1.15 (Weak Nullstellensatz) Let K be a field, and let I be a proper ideal of P = K[x1 , . . . , xn ]. Then Z(I) 6= ∅. Corollary 1.16 Let L be a field which contains the algebraic closure of K, and let I be a proper ideal in P = K[x1 , . . . , xn ]. The following conditions are equivalent. a) ZL (I) = ∅ b) 1 ∈ I In particular, this result holds if K is replaced by the field of definition of I. Proof. Clearly b) implies a). For the converse, we observe that ZL (I) = ∅ implies that Z(I) = ∅, and then 1 ∈ I by the Weak Nullstellensatz.  Definition 1.17 Let K ⊆ L be a field extension, and let X ⊆ Ln . Then the set of all polynomials f ∈ K[x1 , . . . , xn ] such that f (a1 , . . . , an ) = 0 for all points (a1 , . . . , an ) ∈ X forms an ideal of the polynomial ring K[x1 , . . . , xn ]. This ideal is called the vanishing ideal of X in K[x1 , . . . , xn ] and denoted by I(X). Theorem 1.18 (Hilbert’s Nullstellensatz) Let K be an algebraically closed field, and let I be a proper ideal of K[x1 , . . . , xn ]. Then √ I(Z(I)) = I

2

SOLVING POLYNOMIAL SYSTEMS

2.1

Zero-Dimensional Ideals and Gr¨ obner Bases

In the sequel, let K be a field, let P = K[x1 , . . . , xn ], let f1 , . . . , fs ∈ P , and consider I = (f1 , . . . , fs ). Moreover, let K be the algebraic closure of K, and let P = K[x1 , . . . , xn ]. By S we shall denote the system of polynomial equations   f1 (x1 , . . . , xn ) = 0 .. .  fs (x1 , . . . , xn ) = 0 Proposition 2.1 (Finiteness Criterion) Let σ be a term ordering on Tn . The following conditions are equivalent. a) The system of equations S has only finitely many solutions. b) The ideal IP is contained in only finitely maximal ideals of P . c) For i = 1, . . . , n, we have I ∩ K[xi ] 6= (0). d) The K-vector space K[x1 , . . . , xn ]/I is finite-dimensional. e) The set Tn \ LTσ {I} is finite. f ) For every i ∈ {1, . . . , n}, there exists a number αi ≥ 0 such that xαi i ∈ LTσ (I). Proposition 2.2 (Bound for the Number of Solutions) Let f1 , . . . , fs ∈ P be polynomials which generate a zero-dimensional ideal I = (f1 , . . . , fs ). Then the system of equations f1 (x1 , . . . , xn ) = · · · = fs (x1 , . . . , xn ) = 0 n

has at most dimK (P/I) solutions in K . Proposition 2.3 (Squarefree Parts in Characteristic p) Let K be a perfect field of characteristic p > 0 having effective pth roots. Given a polynomial f ∈ K[x] \ K, consider the following sequence of instructions. 1) Calculate s1 = gcd(f, f 0 ). If s1 = 1, then return f . 2) Check whether we have s01 = 0. In this case, s1 = g p for some uniquely determined f polynomial g ∈ K[x]. Calculate g, replace f by fs1g = gp−1 , and continue with step 1). 3) Compute si+1 = gcd(si , s0i ) for i = 1, 2, . . . until s0i+1 = 0, i.e. until si+1 is a pth power si+1 = g p for some g ∈ K[x]. Then calculate g, replace f by fs1g , and continue with step 1). This is an algorithm which computes the squarefree part sqfree(f ) of f .

Proposition 2.4 (Seidenberg’s Lemma) Let K be a field, let P = K[x1 , . . . , xn ], and let I ⊆ P be a zero-dimensional ideal. Suppose that, for every index i ∈ {1, . . . , n}, there exists a non-zero polynomial gi such that gi ∈ I ∩ K[xi ] such that gcd(gi , gi0 ) = 1. Then I is a radical ideal. Corollary 2.5 (Computation of Radicals of Zero-Dimensional Ideals) Let K be a field of characteristic zero or a perfect field of characteristic p > 0 with effective pth roots. The following algorithm computes the radical of a zero-dimensional ideal I in K[x1 , . . . , xn ]. 1) For i = 1, . . . , n compute a generator gi ∈ K[xi ] of the elimination ideal I ∩ K[xi ]. 2) Compute sqfree(g1 ), . . . , sqfree(gn ) and return

I + (sqfree(g1 ), . . . , sqfree(gn )).

Example 2.6 Let us consider the following system S over P = Q[x, y, z].  2 29 2 2 7 7 2 xz + 22 y − 375  75 yz + 1500 z + 75 x − 15 y − 150 z = 0   31 1 2 8 1   xy + 150 xz − 51 yz − 50 z − 15 x + y + 10 z=0     x2 − 7 xz + 4yz + 2 z 2 + 7 x − 20y − 2z = 0 15

5

3

2  z 3 + 56 xz + 24yz − 38  5 z − 6x − 120y + 13z = 0    3 2 3 3  yz 2 − 3 xz − 47  5 yz + 50 z + 5 x + 22y − 10 z = 0   2 25 xz − 7xz + 10x = 0

The associated ideal I ⊆ P , i.e. the ideal generated by the left-hand sides of the equations in S, satisfies dimQ (P/I) = 7, as we can compute for instance using Buchberger’s Algorithm and Macaulay’s Basis Theorem. The upper bound for the number of solutions of S given by Proposition 2.2 is therefore 7. Now we compute the three generators g1 , g2 , and g3 of the elimination ideals I ∩ Q[x], 1 2 1 I ∩ Q[y], and I ∩ Q[z]. We obtain g1 = x4 + 2x3 − 3x2 , g2 = y 4 + 45 y 3 + 20 y − 20 y, and 4 3 2 3 2 g3 = z − 12z + 45z − 50z. Then we calculate sqfree(g1 ) = x + 2x − 3x, sqfree(g2 ) = 1 3 2 y 3 + 10 y − 10 y, and sqfree(g3 ) = z 3 − 7z 2 + 10z. By adding these three equations to the system, we obtain a larger system of polynomial equations with the same set of solutions. Let us try to use Proposition 2.2 again to bound the number of solutions. By computing a Gr¨ obner basis of the associated ideal, we see that the new system of equations can be replaced by  2 6 z + 5 x + 24y − 13  5 z =0   7 6  2  x + 5 x − 12y − 5 z = 0     xy − 3 x + 3 y + 3 z = 0 25

5

50

7 9 7  y 2 + 250 x + 25 y − 500 z=0      xz − 2x = 0    3 3 yz − 25 x − 22 5 y + 50 z = 0

Macaulay’s Basis Theorem yields dimQ (P/J) = 4 i.e. we get a better bound for the number of solutions. Is this a good bound? Our next goal is to show that it is in fact optimal, i.e. that 4 is exactly the number of solutions of S. Theorem 2.7 (Exact Number of Solutions) Let I be a zero-dimensional radical ideal in P , let K be the algebraic closure of K, and let P = K[x1 , . . . , xn ]. If K is a perfect field, the number of solutions of the system of equations S is equal to the number of maximal ideals of P containing IP , and this number is precisely dimK (P/I). Example 2.8 Consider the second system of polynomial equations contained in Example 2.6. Notice that the equation xz − 2x = 0 can be factored into x(z − 2) = 0. As in high school, we can split the system into two systems S1 and S2 . After interreducing the generators, these systems are given by   z − 2 = 0 x = 0 x + 20y − 1 = 0 and y+ 1z=0  2 1  2 10 y − 5y = 0 z − 5z, = 0 Now it is easy to get the four solutions (1, 0, 2), (−3, 51 , 2), (0, 0, 0), (0, − 12 , 5). Proposition 2.9 Suppose that I is a zero-dimensional ideal in P , let t = dimK (P/I),  and assume that the field K contains more than 2t elements. Then there exists a tuple (c1 , . . . , cn−1 ) ∈ K n−1 such that c1 a1 + · · · + cn−1 an−1 + an 6= c1 b1 + · · · + cn−1 bn−1 + bn n

for all pairs of zeros (a1 , . . . , an ), (b1 , . . . , bn ) ∈ K of I. Consequently the linear change of coordinates given by x1 7→ x1 , . . . , xn−1 7→ xn−1 , and by xn 7→ xn − c1 x1 − · · · − cn−1 xn−1 transforms I into an ideal in normal xn -position. Theorem 2.10 Let K be a perfect field, let P = K[x1 , . . . , xn ], let I ⊆ P be a zerodimensional radical ideal, and let gn be the monic generator of the ideal I ∩ K[xn ]. Then the following conditions are equivalent. a) The ideal I is in normal xn -position. b) The degree of gn is equal to dimK (P/I). c) The injection K[xn ] ,−→ P induces an isomorphism K[xn ]/(gn ) ∼ = P/I.

Remark 2.11 Let L/K be a finite field extension, i.e. let L ⊇ K be an extension field which is a finite dimensional K-vector space. Since every K-basis of L is also a system of generators of the K-algebra L, there exists a presentation L ∼ = P/I, where P = K[x1 , . . . , xn ] is a polynomial ring over K, and where I ⊆ P is a maximal ideal.  Let t = dimK (L). If the field K is perfect and has more than 2t elements, we can use Proposition 2.9 to find a linear change of coordinates such that the transform of I is in normal xn -position. Then condition c) of the theorem shows that L is of the form L ∼ = K[x]/(g) with an irreducible polynomial g ∈ K[x]. In other words, viewed as a K-algebra, the field L is generated by a single element. In field theory, this result is called the Primitive Element Theorem. The assumption that K is a perfect field is essential for  the Primitive Element Theorem t to hold, whereas the assumption that K has more than 2 elements can easily be dispensed with. Theorem 2.12 (The Shape Lemma) Let K be a perfect field, let I ⊆ P be a zero-dimensional radical ideal in normal xn position, let gn ∈ K[xn ] be the monic generator of the elimination ideal I ∩ K[xn ], and let d = deg(gn ). a) The reduced Gr¨ obner basis of the ideal I with respect to Lex is of the form {x1 − g1 , . . . , xn−1 − gn−1 , gn }, where g1 , . . . , gn−1 ∈ K[xn ]. b) The polynomial gn has d distinct zeros a1 , . . . , ad ∈ K, and the set of zeros of I is Z(I) = {(g1 (ai ), . . . , gn−1 (ai ), ai ) | i = 1, . . . , d} Corollary 2.13 (Solving Systems Effectively) Let K be a field of characteristic zero or a perfect field of characteristic p > 0 having effective pth roots. Furthermore, let f1 , . . . , fs ∈ P = K[x1 , . . . , xn ], and let I = (f1 , . . . , fs ). Consider the following sequence of instructions. 1) For i = 1, . . . , n, compute a generator gi of the elimination ideal I ∩ K[xi ]. If gi = 0 for some i ∈ {1, . . . , n}, then return "Infinite Solution Set" and stop. 2) Compute hi = sqfree(gi ) for i = 1, . . . , n. Then replace I by I + (h1 , . . . , hn ). 3) Compute d = #(Tn \ LTσ {I}). 4) Check if deg(hn ) = d. In this case, let (c1 , . . . , cn−1 ) = (0, . . . , 0) and continue with step 8).  5) If K is finite, enlarge it so that it has more than d2 elements. 6) Choose (c1 , . . . , cn−1 ) ∈ K n−1 . Apply the coordinate transformation x1 7→ x1 , . . ., xn−1 7→ xn−1 , xn 7→ xn − c1 x1 − · · · − cn−1 xn−1 to I and get an ideal J. 7) Compute a generator of J ∩ K[xn ] and check if it has degree d. If not, repeat steps 6) and 7) until this is the case. Then rename J and call it I.

8) Compute the reduced Gr¨ obner basis of I with respect to Lex. It has the shape {x1 − g1 , . . . , xn−1 − gn−1 , gn } with polynomials g1 , . . . , gn ∈ K[xn ] and with deg(gn ) = d. Return the tuples (c1 , . . . , cn−1 ) and (g1 , . . . , gn ) and stop. This is an algorithm which decides whether the system of polynomial equations S given by f1 = · · · = fs = 0 has finitely many solutions. In that case, it returns tuples (c1 , . . . , cn−1 ) ∈ K n−1 and (g1 , . . . , gn ) ∈ K[xn ]n such that, after we perform the linear change of coordinates x1 7→ x1 , . . . , xn−1 7→ xn−1 , xn 7→ xn − c1 x1 − · · · − cn−1 xn−1 , the transformed system of equations has the set of solutions {(g1 (ai ), . . . , gn−1 (ai ), ai ) | i = 1, . . . , d}, where a1 , . . . , ad ∈ K are the zeros of gn . In other words, the original system of equations has the set of solutions {(g1 (ai ), . . . , gn−1 (ai ), ai − c1 g1 (ai ) − · · · − cn−1 gn−1 (ai )) | i = 1, . . . , d}

2.2

Points and Buchberger M¨ oller-Algorithm

The simplest example of a non-empty affine variety and its defining ideal is given when p ∈ Kn is a rational point, i.e. a point with coordinates in K, and X = {p}. If p has coordinates (a1 , a2 , . . . , an ), then I(X) = I(p) = (x1 − a1 , x2 − a2 , . . . , xn − an ). It is also clear that P/I(p) ∼ = K. Let now consider a “next step” in our investigation, i.e. when p1 , p2 , . . . , ps ∈ Kn and X = {p1 , p2 , . . . , ps }. Lemma 2.14 Let p1 , p2 , . . . , ps ∈ Kn and X = {p1 , p2 , . . . , ps }. Then a) I(X) = I(p1 ) ∩ I(p2 ) ∩ . . . ∩ I(pn ) b) I(X) can be computed using syzygies. c) The field of definition of I(X) is contained in K. Condition b) suggests a way of computing ideals of finite sets of points. However this is definitely not a good way of doing that. A good one is the method based on the Buchberger-M¨oller Algorithm, which we present here. But before we say something more about points, we recall: Proposition 2.15 (Chinese Remainder Theorem) Let A be a ring, I1 , I2 , . . . , Is ideals in A and consider the following canonical homomorphism Φ : A/(∩j Ij ) → ⊕j (A/Ij ). Then a) Φ is injective b) If the ideals Ij are pairwise comaximal, Φ is an isomorphism. Proof. Let J = ∩j Ij . Then Φ is defined by Φ(a mod J) = (a mod I1 , . . . , a mod Is ), hence it is clearly injective. Let us prove b). First we observe that if we call Ji = ∩r6=i Ir , then for every i the two ideals Ii and Ji are comaximal. This follows from the fact that if a maximal ideal contains Ji , then it must contain at least one of the Ir . Therefore, for every i = 1, . . . , s there exists αi ∈ Ii , βi ∈ Ji , such that αi + βi = 1. Now P let M = (m1 mod I1 , . . . ms mod Is ) ∈ ⊕j (A/Ij ). Then it is easy to see that M = Φ( mi βi ).  Remark 2.16 We have seen that the proof of Proposition 2.15 uses the βi , which show up in the equalities αi + βi = 1. But it is clear that what is actually needed is a set of βi with the property that βi = 0 mod Ij for j 6= i and βi = 1 mod Ii . Incidentally, if we have such P elements, we may recover Pthe αi in the following way. We see Pthat j6=i βj +βi = 1 mod Ir for every r. Then α = j6=i βj + βi − 1 ∈ ∩r Ir . Let αi = j6=i βj − α; then αi + βi = 1 for every i and αi ∈ Ii .

Corollary 2.17 Let p1 , p2 , . . . , ps be distinct points in K n and X = {p1 , p2 , . . . , ps }. Then P/I(X) ∼ = Ks Proof. We have already said that P/I(p) ∼ = K for every rational point p. The conclusion follows from Lemma 2.15, since the ideals I(pi ), being maximal ideals, are pairwise comaximal.  Corollary 2.18 Let p1 , p2 , . . . , ps be distinct points in K n and X = {p1 , p2 , . . . , ps }. Let σ be a term ordering and Oσ (I(X)) the order ideal of monomials associated to σ and I(X). Then #(Oσ (I(X))) = s Proof. It follows from the theory of Gr¨obner bases.



The elements βi used in lemma 2.15 and the subsequent Remark behave as a canonical basis. When we deal with finite sets of points, this kind of element plays an important rˆole, so we are lead to the following Definition 2.19 Let p1 , p2 , . . . , ps be distinct points in K n and X = {p1 , p2 , . . . , ps }. A polynomial spi ∈ P is said to be a separator of pi from X if spi (pi ) = 1 and spi (pj ) = 0 for every j 6= i. Corollary 2.20 Let p1 , p2 , . . . , ps be distinct points in K n and b1 , . . . , bs ∈ K. Then there exists a polynomial f ∈ P such that s(pi ) = bi for i = 1, . . . , s. In particular there exist separators. Proof. It is another way of reading Corollary 2.17.



Definition 2.21 A polynomial f ∈ P such that f (pi ) = bi for i = 1, . . . , s as above is called an interpolator or an interpolating polynomial of the values b1 , . . . , bs at the points p1 , p2 , . . . , ps . Corollary 2.22 Let p1 , p2 , . . . , ps be distinct points in K n and X = {p1 , p2 , . . . , ps }. Then let T P ⊆ X, Y = X\T, and for every p ∈ T let sp be a separator of p from X. Finally let sT = p∈T sp ; then I(Y) = (I(X), sT ). Proof. Clearly I(Y) ⊇ ((I(X), sT ). It is also clear that 1−sT vanishes on T. Consequently if f ∈ I(Y), then f (1 − sT ) ∈ I(X), whence the conclusion follows.  Definition 2.23 Let X1 , X2 , . . . , Xn be finite subsets of K. In this case the cartesian product X1 × X2 × · · · × Xn ⊂ K n is a finite set of points called a complete design.

Lemma 2.24QLet X1 = {a11 , . . . a1r1 }, .Q . . Xn = {an1 , . . . anrn }, with aij ∈ K for every i, j, n 1 and let f1 = ri=1 (x1 − a1i ), . . . , fn = ri=1 (xn − ani ). Let us denote by D the complete design X1 × X2 × · · · × Xn . Then 1) I(D) = (f1 , . . . , fn ). 2) {f1 , . . . , fn } is the reduced σ-Gr¨ obner basis of I(D) for every term ordering σ. Proof. To prove 1) it suffices to show that (f1 , . . . , fn ) = ∩p∈X I(p). To prove 2) we observe that the fi are univariate polynomials, hence they are uniquely ordered independently of σ and their leading terms are pairwise coprime.  Lemma 2.25 Let X ⊂ K n be a finite set of points. Then there exists a unique minimal complete design DX containing X. Proof. For i = 1, . . . , n let Xi = {ai1 , . . . airi } be the set of the ith coordinates of the points in X. Then clearly DX = X1 × X2 × · · · × Xn is the solution.  Theorem 2.26 Let s be a positive integer, p1 , p2 , . . . , ps be distinct points in K n and X = {p1 , p2 , . . . , ps }. Then p 1) For every term ordering σ, LTσ (I(X)) = (x1 , x2 , . . . , xn ). 2) It is possible to generate I(X) with n + 1 polynomials. Proof. Let D be the minimal design containing X. Then I(D) is generated by n polynomials by Lemma 2.24 and contained in I(X). This implies that a suitable power of every indeterminate is in LTσ (I(X)), whence 1) follows. To prove 2) it suffices to apply Corollary 2.22 to the pair (D, X).  We are ready to study the Buchberger-M¨oller Algorithm (BM-Algorithm). For a deep investigation on this topic see [AKR01]. We explain the idea informally. Let us consider a finite set of points X = {p1 , . . . , ps } ⊂ Kn and let us fix a term ordering σ. Let T n be the monoid of all power products in x1 , . . . , xn . Then T n = Oσ (I(X)) ∪ LTσ (I(X)) and the union is disjoint. On the other hand the set Oσ (I(X)) is finite (it has cardinality s) by Corollary 2.17 and the generators of LTσ (I(X)) are on the border of Oσ (I(X)), in the sense that if t is a generator of LTσ (I(X)) and t = xi s, then s ∈ Oσ (I(X)). Therefore the region to be explored in order to find Oσ (I(X)) and the generators of LTσ (I(X)) is a finite region and the idea is to explore it moving one step at a time from the σ-smallest power product, which is 1. The exploration will eventually find the reduced σ-Gr¨obner basis of I(X), the set Oσ (I(X)) and a set of separators from X for the points p1 , . . . , ps .

Algorithm C: Classical Buchberger-M¨oller Algorithm Let K be a field, let n ≥ 1, let σ be a term ordering on the power products Tn of K[x1 , . . . , xn ], and let pi = (pi1 , . . . , pin ) ∈ Kn for i = 1, . . . , s. Consider the following sequence of instructions. C1 Start with empty lists G = [ ], O = [ ], S = [ ], a list L = [1], and a matrix M = (mij ) over K, with s columns and initially zero rows. C2 If L = [ ], return the pair [G, O] and stop. Otherwise, choose the power product t = minσ (L), the smallest according to the ordering σ. Delete t from L. C3 Compute the evaluation vector (t(p1 ), . . . , t(ps )) ∈ Ks , and reduce it against the rows of M to obtain X (v1 , . . . , vs ) = (t(p1 ), . . . , t(ps )) − ai (mi1 , . . . , mis ) ai ∈ K i

P C4 If (v1 , . . . , vs ) = (0, . . . , 0) then append the polynomial t − i ai si to the list G, where si is the ith element of S. Remove from L all multiples of t. Continue with step C2. P C5 If (v1 , . . . , vs ) 6= (0, . . . , 0), then add (v1 , . . . , vs ) as a new row to M , and t− i ai si as a new element to S. Append the power product t to O, and add to L those elements of {x1 t, . . . , xn t} which are neither multiples of an element of L nor of LTσ (G). Continue with step C2. Theorem 2.27 Algorithm C stops after finitely many steps. It returns the reduced σGr¨ obner basis G of the ideal I(X), and the list of power products O = Oσ (I(X)) whose residue classes form a K-basis of K[x1 , . . . , xn ]/I. With only a slight change, Algorithm C can also yield the separators fi of the points pi . To do this replace step C2 by the following: C2’ If L = [ ], reduce M to an identity matrix by row operations mimicking these row operations on the elements of S, and then return the triple [G, Q, S] and stop. Otherwise, choose the power product t = minσ (L) and delete it from L. Corollary 2.28 Algorithm C with step C2’ additionally computes a set of separators for the points pi ; these separators are reduced with respect to the Gr¨ obner basis G. Proof. Observe that throughout Algorithm C the matrix M and list S are related as follows: row i of M is just (si (p1 ), si (p2 ), . . . , si (ps )) where si is the ith element of the list S. Moreover, the last time step C2’ is entered M is a non-singular s × s matrix. The linear combinations of the rows of M yielding the identity matrix clearly also convert the elements of S into separators. 

3

APPLICATIONS TO STATISTICS

Now, we try to describe a fruitful interaction between Design of Experiments (DoE) on one side, and Commutative Algebra, Algebraic Geometry and Computer Algebra on the other. DoE is a branch of Statistics, and let us have a look at the main concepts in DoE by discussing an example. Suppose that a firm wants to introduce a new product, say a portable telephone, into the market. They need to obtain useful a priori information from the potential customers. Let’s say that the strategy of the firm is to test the reaction to five features of the telephone: colour, shape, weight, material and price. Let us associate the variables c with colour, s with shape, w with weight, m with material, p with price and suppose that for each variable, three values are chosen, which can be coded as {0, 1, 2}. The interpretation is, for instance, that c = 0 means red, c = 1 means green, c = 2 means black; w = 0 means weight in between 200 and 250 grams, and so on. Therefore the particular characteristics of a portable telephone are codified by a point whose coordinates are five integers chosen from {0, 1, 2}. So the point (0, 0, 0, 0, 0) could correspond to a red, rounded, light, plastic, low priced telephone, while the point (0, 1, 1, 0, 2) could correspond to a red, slanted, medium weight, plastic, high priced telephone, and so on. A selected set of potential customers should be asked to rate the various possibilities, i.e. the various points, on a scale from 0 to 10, say. Similar examples can be drawn from several fields of application, ranging from industrial design to chemistry, marketing, and finance. The method is always to draw information from experiments, and the goal can be to gain information from the market, to detect failures in a complex system, to find the correct blend of substances in a chemical product and so on. The nice feature of these examples is that all of them can be modeled in the same way, that is to say with a finite set of experimental treatments or points, which are described by their coordinates. The set of all these points is called a Design. A complete set where all combinations of values are allowed, such as the 35 points in our example, is a product set called in Statistics a Full (or Full Factorial ) Design (see Definition 2.23). Let us go back to our example. We call D the set of the 35 = 243 points, and suppose for a moment that we can ask the customers to rate all of them. Having done that, to each point we associate for instance the mean value of the answers, and we get a function y : D → Q (the choice of the mean value as an example is motivated by the fact that we are going to treat models without replications). We observe that every function from a finite set of points to a field is a polynomial function, hence y = f (c, s, w, m, p) with f a polynomial in Q[c, s, w, m, p]. Such a polynomial function is called a polynomial model for our design. It encodes all the relevant information, and from it the firm can deduce fundamental knowledge for the final decision on the characteristics of the portable telephone. However, in this case (as well as in other fundamental situations) there is an obstacle: no potential customer would be willing to rate 243 cases. The problem is to determine a suitable subset F of D, called a Fraction, from which we can equally well reconstruct a good model. For instance it would be nice to be able to identify the model, simply from

rating a small subset of D. Before we go on let’s introduce some commutative algebra. The following polynomial x(x−1)(x−2) = x3 −3x2 +2x vanishes on the set {0, 1, 2}. This fact can be interpreted as saying that the polynomial functions x3 , x2 , x are linearly dependent over {0, 1, 2}. In fact the function x3 takes the same values as 3x2 − 2x, which, in statistical jargon, is expressed by saying that x3 and 3x2 − 2x are confounded by {0, 1, 2}. Also in the multivariate case a full design has canonical confounding polynomials (or equations) and a canonical basis of power-products, O(D). In our example (henceforth we use x1 , x2 , x3 , x4 , x5 instead of C, S, W, M, P ) the canonical confounding polynomials are fi := xi (xi − 1)(xi − 2), i := 1, . . . , 5. All together they generate I(D), the defining ideal of D . The canonical basis O(D) is the set of 243 power-products {xa11 xa22 · · · xa55 | ai = 0, 1, 2; i := 1, . . . , 5}. Given a full design D, a fraction F of it is simply a subset, but from the algebraic point of view, its description is no longer canonical. Of course the ideal I(F), which defines the given fraction, contains I(D), the defining ideal of D, simply because every polynomial which vanishes on D automatically vanishes on F. So, there are more confounding polynomials, which have to be joined to the canonical polynomials of D, and the key point is to be able to draw information about a given fraction from its confounding polynomials. The first attempt to solve the problem of identifying a model from a fraction can be done easily by linear algebra methods. But we want much more. For instance, if we go back to our guiding example, we may envisage the following potential situations. The first case is that the firm wants to use a particular fraction F of D. The reason could be that they have already run previous experiments for similar purposes. In this case the question is: given a fraction, what are the models identifiable by it? This is our Problem 1 of Section 3.2. The second, and most important case is that the firm already knows the support of the model, i.e. the set of the power-products actually occurring in the polynomial which describes the model. It is clear that such support is contained in O(D), so, to identify the model, it suffices to identify the coefficients. For instance the coefficient of x1 quantifies the importance that the customers give to the colour of the telephone. The coefficient of x1 x2 quantifies the importance given to the interaction between colour and shape and so on. Indeed in Statistics the indeterminates are called main effects and the other powerproducts, which involve more than one indeterminate, interactions. Suppose that via some previously acquired knowledge, the firm knows that only the constant and the main effects are relevant; then the model has the shape y = a0 + a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 . One has to identify only six coefficients. A first basic remark and Corollary 2.18 shows that it suffices to consider fractions with six points, but then the question is: what are the fractions of D, which identify the six coefficients? This is our Problem 2 of Section 3.2, one of the most important in DoE. At this point Gr¨obner bases are most welcome. To get further knowledge on the subject, please read [R98] and [R01]

3.1

Identifying the Models

In Subsection 2.2 we have seen that every function on a finite set of points can be represented by a polynomial function and we have seen how to compute such a polynomial. These facts are relevant to DoE, since its main goal is to construct good models from a subset of a full design. Now it is time to give the following fundamental definitions: Definition 3.1 Given a full design D, a fractional factorial design is a proper subset F of D. It is also called a fraction of D. Its defining ideal I(F) contains the ideal I(D). Definition 3.2 Any set of polynomials that, added to the canonical polynomials of a full design D, generates the ideal of a fraction F, is called a set of confounding polynomials of F in D. Definition 3.3 Let D be a full design and F a fraction of it. Then we call characteristic polynomial of F ⊂ D a polynomial f such that f (p) = 0 for every p ∈ F and f (p) = 1 for every p ∈ D\F. We have seen in Theorem 2.24 that I(D) is generated by {f1 , . . . , fn }, which is a universal Gr¨obner basis, and as a consequence of Corollary 2.22 and Theorem 2.26 we have the following: Proposition 3.4 Let D be a full design, F a fraction of it, and {f1 , . . . , fn } the canonical polynomials of I(D). Let f be a characteristic polynomial of F. Then I(F) = (f1 , . . . , fn , f ). Theorem 3.5 Let D be a full design, f1 , . . . , fn the canonical polynomials (as described in Theorem 2.24), and F a fraction of it. Then there exists a unique characteristic polynomial f of F in D such that degXi (f ) < deg(fi ) for every i := 1, . . . , n. It will be called the canonical confounding polynomial of F in D. Proof. Let f, g be two characteristic polynomials of F in D. Then, by definition, f − g vanishes on D, hence it belongs to the ideal I(D) = (f1 , . . . , fn ). The set {f1 , . . . , fn } is a universal Gr¨obner basis, hence the theory tells us that f and g have the same normal form with respect to it and such a normal form is the remainder of the division of f (or of g) with respect to {f1 , . . . , fn }. The conclusion follows immediately.  Remark 3.6 In DoE it is more common to use the indicator polynomial of F ⊂ D, which is a polynomial such that f (p) = 1 for every p ∈ F and f (p) = 0 for every p ∈ D\F. Of course the indicator polynomial and the characteristic polynomial sum to 1 on D, so that it is clear how to switch from one to the other.

Example 3.7

Let D be a 23 full design. Its canonical polynomials are f1 := x(x − 1), f2 := y(y − 1), f3 := z(z − 1)

Let F := {(0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1)} be a fraction of D. Let p := (0, 1, 0), q := (0, 0, 1), r := (1, 1, 1). Then D = F ∪ {p, q, r}. Using the BM-algorithm we find the separators sp = y − yz − xy + xyz, sq = z − yz − xz + xyz, sr = xyz respectively of the points p, q, r from D. Therefore the polynomial sp +sq +sr = y + z − xy − xz − 2yz + 3xyz is the canonical confounding polynomial of F in D.

It is important to observe that all the information of F ⊂ D is completely contained in its canonical confounding polynomial. In DoE this point of view has not been fully explored yet. On the other hand, the beauty of encoding the information into the single canonical confounding polynomial is somehow overshadowed by the fact that directly from it one cannot easily extract further useful information. Definition 3.8 Let F be a fraction of a full design D. Let σ be a term-order and Gσ the corresponding reduced Gr¨ obner basis of I(F). The set of those polynomials in Gσ , which are not among the canonical polynomials of D is called the σ-canonical set of confounding polynomials of F in D. Moreover we use the notation Oσ (F) instead of Oσ (I(F)). Proposition 3.9 Let F be a fraction of a full design D. Let σ be a degree-compatible term-order. Then the BM-algorithm computes Oσ (F) with elements of minimal degree. Proof. In fact the order which the power-products enter the computation is −σ (i.e. σ-smallest first).  Let D be a 23 full design and let F be the following fraction of it: {(1, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1)}.

Example 3.10

If σ := DegRevLex, then the σ-canonical set of confounding polynomials of F in D is {xy − x − y + 1, xz − x − z + 1, yz + x − 1}. And Oσ (F ) = {1, x, y, z}. Alternatively, if σ := Lex, then the σ-canonical set of confounding polynomials of F in D is {x + yz − 1}. And Oσ (F ) = {1, y, z, yz}. Let D be a 23 full design and let F be the following fraction of it (as in Example 3.7): {(0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1)}.

Example 3.11

If either σ := DegRevLex or σ := Lex, then the σ-canonical set of confounding polynomials of F in D is {xy + yz − y, xz + yz − z}. In both cases Oσ (F ) = {1, x, y, z, yz}.

Now we are ready to enter the “true” game. We have seen that over the fractions of a full design all the functions are polynomial. But what kind of polynomials are to be considered? Let us see what happens with the data of Example 3.10. For instance we know that the polynomial x + yz − 1 vanishes on F := {(1, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1)}.

This implies that the two polynomial functions x and 1 − yz take the same values on F, i.e. they are confounded by F. On the other hand the 4 points are not planar, hence no linear combination of 1, x, y, z except the trivial one vanishes on F. In other words, the polynomial functions 1, x, y, z on F are linearly independent. Now suppose that we are looking for a linear combination of these four polynomial functions, which interpolates some given values at the four points, for instance 2, 5, −3, −6. We have learned that the interpolation problem can be solved via separators, but there is another way to proceed. Namely we take a generic linear combination a+bx+cy +dz of these four functions, where a, b, c, d are independent parameters. Then we evaluate it at the four points, and impose the given values. We get a + b = 2, a + b + c = 5, a + b + d = −3, a + c + d = −6, which is a non-singular linear system. The unique solution is a = −4, b = 6, c = 3, d = −5. In this case, by evaluating the polynomial a + bx + cy + dz at the four points of F, we could identify it, i.e. we could identify its coefficients. Example 3.12

Let us go back to Example 3.7 . O(D) = {1, x, y, z, xy, xz, yz, xyz}, therefore to compute the canonical confounding polynomial of F in D we can also proceed as follows. We consider the generic linear combination a1 +a2 x+a3 y+a4 z+a5 xy+a6 xz+a7 yz+ a8 xyz and we impose that its evaluation at the points of F is 0, and at the points of D\F is 1. We get the following linear system a1 = 0, a1 +a2 = 0, a1 +a2 +a3 +a5 = 0, a1 +a2 +a4 +a6 = 0, a1 +a3 +a4 +a7 = 0, a1 +a3 −1 = 0, a1 +a4 −1 = 0, a1 +a2 +a3 +a4 +a5 +a6 +a7 +a8 −1 = 0, from which we deduce a1 = 0, a2 = 0, a3 = 1, a4 = 1, a5 = −1, a6 = −1, a7 = −2, a8 = 3, hence we find f := y + z − xy − xz − 2yz + 3xyz, in agreement with Example 3.7.

Once again in this example, by evaluating a1 + a2 x + a3 y + a4 z + a5 xy + a6 xz + a7 yz + a8 xyz at the eight points of D, we could identify it, i.e. we could identify its coefficients. The considerations made before and these examples lead to the following: Definition 3.13 A polynomial model is a polynomial function, i.e. y = f (x1 , . . . , xn ), where f (x1 , . . . , xn ) ∈ k[x1 , . . . , xn ]. If k ⊂ K is a field extension and f (x1 , . . . , xn ) ∈ K[x1 , . . . , xn ] is such that its coefficients are algebraically independent over k, then y = f (x1 , . . . , xn ) is called a linear family of polynomial models or simply a linear polynomial model. The information of a linear polynomial model is encoded in the support of the polynomial, so we can call linear polynomial model the vector space generated by a set of power-products and call support of the model such a set. Example 3.14

Let k := Q, K := Q(a, b, c), where a, b, c are transcendental over Q. Then y = ax2 − bxyz + c is a linear polynomial model. Its support is {x2 , xyz, 1}

Definition 3.15 Let y = f (x1 , . . . , xn ) be a linear polynomial model and consider a set X := {p1 , p2 , . . . , ps } of distinct points of k n . We say that y is identifiable by X if, by evaluating it at the given points, we can uniquely determine its coefficients.

In practice it is common to use polynomial models, whose support is a standard set of power-products. The reason is quite clear: if for a given problem it is relevant to evaluate a power-product, it seems to be even more relevant to evaluate its divisors. Definition 3.16 A complete (linear) polynomial model is a (linear) polynomial model, whose support is a standard set of power-products.

Now we can rephrase some results with the new terminology, in particular we can interpret Theorem 2.17 and Corollary 2.18. Theorem 3.17 Let p1 , p2 , . . . , ps be distinct points in k n and denote by X the set of points {p1 , p2 , . . . , ps }. then let y = f (x1 , . . . , xn ) be a linear polynomial model, with support S := {t1 , . . . , tr } and let X(S, X ) be the (s × r)-matrix whose element in position (i, j) is tj (pi ), the evaluation of tj at pi . then the following conditions are equivalent 1) y is identifiable by X 2) rk(X(S, X )) = r 3) S is a set of linearly independent functions on X . Proof. We know from Theorem 2.17 that the dimension of the vector-space of polynomial functions over X is s, therefore the equivalence is simply a statement in elementary linear algebra.  Remark 3.18 In DoE matrices such as X(S, X ) above are called matrices of the model. Theorem 3.19 Let p1 , p2 , . . . , ps be distinct points in k n and denote by X the set of points {p1 , p2 , . . . , ps }. Let σ be a term-order and Oσ (I(X )) the standard set of power-products associated to σ and I(X ). then every polynomial model, whose support is contained in Oσ (I(X )), is identifiable by X . Proof.

It follows immediately from Theorem 3.17 and Corollary 2.18.

In the following section some important problems in DoE are discussed.



3.2

Some Problems

At this point many natural and fundamental questions arise. Let D be a full design with levels (l1 , l2 , . . . , ln ). We know that its cardinality is #(D) = l1 × l2 × · · · × ln , which is also #(O(D)). problem 1: Let F := {p1 , p2 , . . . , ps } ⊂ D. What are the complete linear polynomial models which can be identified by F? problem 2: Let y = f (x1 , . . . , xn ) be a complete linear polynomial model, whose support is contained in O(D)). What are the minimal fractions, which identify y? It is important to understand the relation between Problem 1 and the algebraic methods, from several points of view. As we mentioned in the introduction, this connection was the first one discovered between DoE and Gr¨obner bases (see Pistone and Wynn (1996)). A first step is to observe that Problem 1 can be rephrased in the following way: Problem 10 : Let F := {p1 , p2 , . . . , ps } ⊂ D. What are the possible standard sets of power-products, which are bases of R/I(F) as a k-vectorspace? Actually the exact rephrasing of Problem 1 would be the following. Let F := {p1 , p2 , . . . , ps } ⊂ D; what are the standard sets of power-products, which are k-linearly independent as sets of elements of R/I(F)? But, of course, if we are able to find the maximal ones, we get the non-maximal ones for free. A first step can easily be made by linear algebra methods. Namely, suppose we have F := {p1 , p2 , . . . , ps } ⊂ D and a standard set of power-products O ⊂ O(D) (or even any set of power-products). If we want to check whether or not it is a basis of R/I(F) as a k-vectorspace, it suffices to construct the matrix X(O, F) (see Theorem 3.17). The answer is: Corollary 3.20 The following conditions are equivalent a) O is a basis of R/I(F). b) X(O, F) is invertible. Proof.

It follows immediately from Theorem 3.17.



However, Problem 10 looks for a complete answer, in the sense that we want to find all the standard sets of power-products, which are bases of R/I(F) as a k-vectorspace. A good approach seems to be the theory of Gr¨obner bases; indeed one can use Corollary 2.18 and produce many solutions to the problem. But now the question is: how many? It is well-known that, given an ideal I, for every term-order σ there is a unique standard set of power-products Oσ (I). The term-orders are infinite in number, but the standard sets of power-products associated to I are finite, and each one is a basis of R/I as a k-vectorspace. Are they all? Let us have a look at the following important example:

Let D be a 32 full design. Since x3 , y 3 are the leading terms of the canonical polynomials, no matter which values of the coordinates are chosen, it turns out that O(D) = {1, x, y, x2 , xy, y 2 , x2 y, xy 2 , x2 y 2 }

Example 3.21

We pick the fraction F := {(0, 0), (0, −1), (1, 0), (1, 1), (−1, 1)} of D and let O := {1, x, y, x2 , y 2 }, which is a standard set of power-products. We construct 01 B1 B X(O, F ) = B 1 @ 1 1

0 0 1 1 −1

0 −1 0 1 1

0 1 1 1 1

01 1C C 0C A 1 1

whose determinant is −4, hence we use Corollary 3.20 and conclude that O is a basis of R/I(F ). The polynomial f := x2 + xy − x − 21 y 2 − 12 y is in the defining ideal of F (this fact can be easily seen by checking that f vanishes at the 5 points of F ). Let σ be a termorder. From the definition of term-order it follows that x2 >σ x and y 2 >σ y. Now if x >σ y, then x2 >σ xy >σ y 2 . If y >σ x, then y 2 >σ xy >σ x2 . This means that there are only two possibilities for the leading term of f , either it is x2 or y 2 . The conclusion is that O := {1, x, y, x2 , y 2 } cannot be Oσ (F ), whatever σ we choose.

The above example shows that the theory of Gr¨obner bases provides a good approach to the problem, but it does not always yield the full solution. More work has to be done.

Now we are going to discuss the more important and more complicated Problem 2, which can be reformulated as: Problem 20 : Let O ⊂ O(D) be a standard set of power-products. What are the fractions F of D, such that O is a basis of R/I(F) as a k-vectorspace? Definition 3.22 Let O ⊂ O(D) be a standard set of power-products. If σ is a term-order, then we denote by Sol(O, σ) the set of those fractions F, such that Oσ (F) = O, hence Oσ (F) is a basis of R/I(F) as a k-vectorspace. We denote by Sol(O, ∃σ) the set of those fractions F, such that there exists a termorder σ with Oσ (F) = O, hence Oσ (F) is a basis of R/I(F) as a k-vectorspace. We denote by Sol(O) the set of those fractions F, such that O is a basis of R/I(F) as a k-vectorspace, hence Sol(O) is the set of all the solutions to Problem 20 . Example 3.21 shows explicitly that Sol(O, ∃σ) ⊂ Sol(O) may happen. This fact induces several other questions. For instance, we would like to be able to find easy classes of solutions. It would also be nice to have good classes of examples where Gr¨obner bases suffice to solve the problem completely. This investigation was carried out in [RR98] and in [CR97]. Here we report on some ideas explained there.

Definition 3.23 Given n indeterminates x1 , . . . , xn , let D be a full design and consider a standard set of power-products O ⊂ O(D) . Then there exists a unique minimal set, Min(O), of power-products which generates {x1 , . . . , xn }∗ \O, which means that every element in {x1 , . . . , xn }∗ \O is a multiple of an element of Min(O). The set of power-products in Min(O), which are not among the leading terms of the canonical polynomials of D, is called CutOut(O). The meaning is that O is the set of power-products not divisible by any element of the set CutOut(O) ∪ {Lt(f1 ), . . . , Lt(fn )}. Let us illustrate it with an example. Let D be a 32 full design. We know that the leading terms of the two generators of I(D) are x3 , y 3 . Moreover, we know (see Example 3.21) that O(D) = {1, x, y, x2 , xy, y 2 , x2 y, xy 2 , x2 y 2 }. If O1 := {1, x, y, x2 , xy, y 2 } then CutOut(O1 ) = {x2 y, xy 2 }. If O2 := {1, x, y, x2 , xy, x2 y} then CutOut(O2 ) = {y 2 }.

Example 3.24

Let D be a full design and O ⊆ O(D) a standard set of power-products. Let us denote by f1 , . . . , fn the canonical polynomials of D and let CutOut(O) := {T1 , . . . , Th }. Suppose we are able to find h more polynomials {g1 , . . . , gh } and a term-order σ such that: 1) Ltσ (gi ) = Ti , for i := 1, . . . , h; 2) {f1 , . . . , fn , g1 , . . . , gh } is a σ-Gr¨obner basis. Then we denote by I the ideal generated by {f1 , . . . , fn , g1 , . . . , gh } and we see that I(D) ⊂ I, hence I = I(F), where F is a fraction of D. Moreover Oσ (I) = Oσ (F) = O, hence O is a basis of R/I(F) by Corollary 2.18, and the fraction F, whose cardinality is #(O) by Theorem 2.17, is a solution to Problem 2’. Example 3.25

We continue with the same data as in Example 3.24. Suppose that the values of the coordinates are {0, 1, 2} for both x and y. Then f1 = x(x−1)(x−2), f2 = y(y−1)(y−2). If we take g1 := x(x−1)y, g2 := xy(y−1), then one checks that {f1 , f2 , g1 , g2 } is a σ-Gr¨ obner basis for every σ. Therefore I := (f1 , f2 , g1 , g2 ) defines a fraction F , which is easily seen to be {(0, 0), (0, 1), (0, 2), (1, 0), (2, 0), (1, 1)}, and O1 = {1, x, y, x2 , xy, y 2 } is a basis of R/I(F ). By the equivalence of Problem 2 and Problem 2’, we conclude that any polynomial model whose support is contained in O1 can be identified by F .

The example can be explained by the following facts. Definition 3.26 Let k be an infinite field, T := xa11 xa22 · · · xann and let α1 , . . . , αn be n sequences of elements of the base field k, where αr = (αr,i )i∈N for r := 1, . . . , n and αr,i 6= αr,j if i 6= j. Then we callQthe polynomial Q n 1 (xn − αn,i ) (x1 − α1,i ) · · · ai=1 D(T ) := ai=1 the distraction of T with respect to α1 , . . . , αn .

Proposition 3.27 Let T1 , . . . , Th be power-products such that Ti does not divide Tj for every i 6= j, let α1 , . . . , αn be n sequences of elements of k and let I := (D(T1 ), . . . , D(Th )) be the ideal generated by the corresponding distractions with respect to α1 , . . . , αn . Then {D(T1 ), . . . , D(Th )} is the reduced Gr¨ obner basis of I with respect to every term-order. Proof.

See [CarR90], Theorem 1.4 and Corollary 2.6.



It is clear that, in order to get a distraction of T , it suffices to have vectors of sufficiently many elements of k. So we can speak of distractions associated to vectors, not to sequences, and sometimes we can drop the assumption that k is infinite. Theorem 3.28 Let D be a full design with levels (l1 , l2 , . . . , ln ) and whose values of the coordinates are 0, 1, . . . , li − 1, for i := 1, . . . , n. Let I(D) := (f1 , . . . , fn ) be its defining ideal, where fj := xj (xj −1) · · · (xj −lj +1) are the canonical polynomials, for i = 1, . . . , n; let O(D) be its corresponding standard set of power-products and O a standard set of powerproducts contained in O(D). Assume that {T1 , . . . , Th } = CutOut(O). Let α1 , α2 , . . . , αn be permutations of (0, 1, . . . , l1 − 1), . . . , (0, 1, . . . , ln − 1) respectively and D(T1 ), . . . D(Th ) the distractions of T1 , . . . , Th with respect to α1 , α2 , . . . , αn . Then, for every term-order σ, we have: {f1 , . . . , fn , D(T1 ), . . . D(Th )} is a σ-Gr¨ obner basis and {D(T1 ), . . . , D(Th )} is the σ-canonical set of confounding polynomials of a fraction F, such that Oσ (F) = O. Proof.

See [RR98], Theorem 3.4.



Remark 3.29 It is easy to see that the choice of coordinates can be generalized to any set. It suffices then to change the permutations α1 , α2 , . . . , αn accordingly. Definition 3.30 Given a full design D, a term-order σ, and a standard set of powerproducts O ⊂ O(D), let F be a fraction of D, whose σ-canonical set of confounding polynomials is given by distractions as indicated in Theorem 3.28. Then F is called a distracted fraction. Definition 3.31 Given a full design D and a standard set of power-products O ⊂ O(D), we denote by Distr(O) the set of the distracted fractions, whose associated standard set of power-products is O for some term-order σ. Then we denote by Sol(O, ∀σ) the set of those fractions F such that Oσ (F) = O for every term-order σ. Corollary 3.32 Given the hypotheses of Theorem 3.28, Distr(O) ⊆ Sol(O, ∀σ) Proof.

It follows from Proposition 3.27.



Remark 3.33 In [RR98] some more theory is developed. For instance an explicit formula for #(Distr(O)) is proved and a better insight is given to the difference between Distr(O) and Sol(O, ∀σ). The following example is drawn from that paper. Let D be a 23 full design, O := {1, x, y, z}. Let F := {(0, 0, 0), (0, 0, 1), (1, 0, 0), (1, 1, 0)}. Then I(F ) = (x2 − x, y 2 − y, z 2 − z, xy − x, xz, yz) and {xy − x, xz, yz} is the DegRevLex-canonical set of confounding polynomials of F . Therefore O = ODegRevLex (F ) and we deduce from [RR98], Theorem 3.13 that {xy − x, xz, yz} is the τ -canonical set of confounding polynomials of F for every term-order τ , and thus that F ∈ Sol(O, ∀σ). On the other hand, F ∈ / Distr(O). Suppose the contrary; then it follows from [CarR90] that every pair of polynomials in {x2 −x, y 2 −y, z 2 −z, xy−x, xz, yz} should be a Gr¨ obner basis. But clearly {xy − x, xz} is not. With respect to the 23 full design and the standard set of power-products O := {1, x, y, z}, we conclude that Distr(O) ⊂ Sol(O, ∀σ). A suitable computation done with statfamilies shows that #(Distr(O)) = 8 and #(Sol(O, ∀σ)) = 32.

Example 3.34

We have already seen that Gr¨obner basis theory in general cannot solve Problem 20 completely. But there are nice situations where it does. We recall the following: Theorem 3.35 Given a full design D, a term-order σ and a standard set of powerproducts O ⊂ O(D), there is an algorithm which computes Sol(O, σ). Proof.

See [CR97], Theorem 3.3 and subsequent remarks.



Definition 3.36 Let O ⊂ {x1 , . . . , xn }∗ denote a standard set of power-products, σ a term-order and T a power-product. We put σ(O, T ) := {T 0 ∈ O | T 0 σ y, then we match the assumptions of Theorem 3.37, since σ(O, x2 ) = σ(O, xy) = O. To solve Problem 20 completely in this specific example, it suffices to find all the σ-canonical sets of confounding polynomials, whose leading terms are x2 , xy. All the corresponding fractions F are such that Ltσ (I(F )) is minimally generated by {x2 , xy, y 3 }.

Example 3.38

Let g1 := x2 + a1 y 2 + a2 x + a3 y + a4 , g2 := xy + a5 y 2 + a6 x + a7 y + a8 The theory says that we have to impose that {f1 , f2 , g1 , g2 } is a σ-Gr¨ obner basis. This yields the following algebraic system: a27 a8 − a6 a7 + a5 = 0, a37 + 3a27 + 2a7 = 0, a5 a27 + 3a5 a7 + 2a5 = 0, a27 a8 − a6 a7 + a5 = 0, a4 a5 a7 − a1 a3 − a2 a5 − 3a1 = 0, a5 a7 a8 + a3 a5 − a5 a6 − a1 a7 = 0, a4 a7 a8 − a3 a4 − a4 a6 − a2 a8 − 3a4 a8 − 3a4 = 0, a4 a27 − a23 − a2 a7 + a1 − 3a3 − 2 = 0, a4 a6 a7 − a2 a3 − a4 a5 − a2 a6 + 2a4 a8 − 3a2 = 0, a6 a27 − a5 a7 + 3a6 a7 + 2a7 a8 − 3a5 = 0, a7 a28 − a4 a7 + a3 a8 − 2a6 a8 − 3a28 − a2 − 3a4 = 0, a6 a7 a8 + a3 a6 − a26 − a2 a7 − a5 a8 + 2a28 − a1 + 2a4 = 0 ` ´ Solving the system yields 81 solutions, which correspond to 81 among all the 94 = 126 fractions of cardinality 4. These 81 fractions are all the fractions F of D such that {1, x, y, y 2 } is a basis of R/I(F ). In this case Problem 20 is solved completely.

We conclude the survey by saying a few words about Problem 3. As before, it is easy to see that the problem can be rephrased as Problem 30 : Let s < #(D). What are the fractions F of D of cardinality s, with the highest (lowest) number of standard sets of power-products, which are a basis of R/I(F) as a k-vectorspace? Again Example 3.21 shows that the theory of Gr¨obner bases cannot give the full solution. Here we show that it is possible to characterize those fractions with a unique standard set of power-products as a basis of R/I(F). First, we recall the following fact from [RR98]. Theorem 3.39 Given a design D and a standard set of power-products O ⊂ O(D), let F be a fraction. Let σ be a term-order and Gσ := {g1 , . . . , gh } the σ-canonical set of confounding polynomials of F. If we write gi := Ti −ri , with Ti := Ltσ (gi ) for i := 1, . . . , h, then the following conditions are equivalent

1) T divides Ti for every T in the support of ri and every i := 1, . . . , h 2) Gσ is the τ -canonical set of confounding polynomials of F for every term-order τ 3) F ∈ Sol(O, ∀σ). Proof.

See [RR98], Theorem 3.13.



We conclude by proving the following Theorem 3.40 Given a design D and a standard set of power-products O ⊂ O(D), let F be a fraction. Then the following conditions are equivalent 1) O is the unique standard set of power-products which is a basis of R/I(F) as a k-vectorspace 2) F ∈ Sol(O, ∀σ). Proof. The fact that 1) implies 2) is obvious. Let us prove that 2) implies 1). If F ∈ Sol(O, ∀σ), then we deduce from Theorem 3.39 that there exists a set of polynomials G := {g1 , . . . , gh } which is the σ-canonical set of confounding polynomials, for every σ. Moreover the shape of these polynomials is gi := Ti −ri , with Ti = Lt(gi ) for every σ and T divides Ti for every T ∈ Supp(ri ). Let Supp(ri ) := {Ti1 , . . . , Timi }; then Ti , Ti1 , . . . , Timi are linearly dependent modulo I(F). Now let O0 be another standard set of power-products which is a basis of R/I(F) as a k-vectorspace. The key remark is that Ti cannot be in O0 , otherwise also all the Tij would be in O0 , simply because Tij divides Ti . This fact implies that O0 ⊆ O, hence they are equal simply because they have the same cardinality.  Remark 3.41 This theorem shows that not only the Distracted Fractions but also those fractions as in Example 3.34 identify a unique maximal complete linear polynomial model. We conclude the survey by summarizing some of the most important facts treated above. Putting together Corollary 3.32, Example 3.21, the subsequent discussion, Theorem 3.35, Theorem 3.37 and Theorem 3.40, we conclude by saying that Distr(O) ⊆ Sol(O, ∀σ) ⊆ Sol(O, ∃σ) ⊆ Sol(O) All the inclusions can be strict and the set Sol(O, ∀σ) is precisely the set of those fractions which identify a unique maximal complete linear polynomial model. Moreover, given a term-order σ, there is an algorithm which computes all the elements in Sol(O, σ), and we have established in Theorem 3.37 a set of sufficient conditions for Sol(O, σ) to be equal to Sol(O). The problem of computing Sol(O) in general was solved recently (see [CR01]), using the notion of a border basis.

3.3

Two Examples

We conclude this chapter with two important examples which yield a motivation for introducing border bases. Example 3.42 Consider the following set {(0, 0), (0, −1), (1, 0), (1, 1), (−1, 1)} of five points in A2Q , denote by I ⊂ P = Q[x, y] the vanishing ideal of this set, and denote by O = (1, x, y, x2 , y 2 ). The evaluation matrix of the elements of O at the points has determinant −4, hence (1, x, y, x2 , y 2 ) is a Q-basis of P/I. Consequently, there is a K-linear surjective map P −→ V (O) which sends every polynomial f to the uniquely defined element a1 + a2 x + a3 y + a4 x2 + a5 y 2 of V (O) such that the residue class of f in P/I is a1 + a2 x + a3 y + a4 x2 + a5 y 2 . However, this map is not induced by a map of the type NFσ,I , since {1, x, y, x2 , y 2 } is not of the form T2 \ LTσ (I) for some term ordering σ. Example 3.43 Use Q[xy]; I:=Ideal(x^2 + y^2 + x+2y-1, x^2 - y^2 - x); ReducedGBasis(I); [y^2 + x + y - 1/2, x^2 + y - 1/2] ------------------------------QuotientBasis(I); [1, y, x, xy] ------------------------------Use Q[xy], Lex; I:=Ideal(x^2 + y^2 + x+2y-1, x^2 - y^2 - x); ReducedGBasis(I); [x + y^2 + y - 1/2, y^4 + 2y^3 - 1/4] ------------------------------QuotientBasis(I); [1, y, y^2, y^3] ------------------------------Use Q[xy], DegLex; I:=Ideal(x^2 + 1/10000xy+ y^2 + x+2y-1, x^2 - y^2 - x); ReducedGBasis(I); [xy + 20000y^2 + 20000x + 20000y - 10000, y^3 - 2666400000000/133333333y^2 2666533330000/133333333x 7999799980000/399999999y + 3999799990000/399999999, x^2 - y^2 - x]

QuotientBasis(I); [1, y, y^2, x] -----------------------------Use Q[xy], Lex; I:=Ideal(x^2 + 1/10000xy+ y^2 + x+2y-1, x^2 - y^2 - x); ReducedGBasis(I); [x - 133333333/2666533330000y^3 + 266640000/266653333y^2 + 99979998/799959999y 399979999/799959999, y^4 + 266660000/133333333y^3 + 20000/399999999y^2 10000/399999999y 100000000/399999999] ------------------------------QuotientBasis(I); [1, y, y^2, y^3]

References [AKR01] J. Abbott, M. Kreuzer, L. Robbiano, Computing Zero-Dimensional Schemes, Preprint (2001) [CR97]

M. Caboara, L. Robbiano, Families of Ideals in Statistics, Proceedings of ISSAC-1997. Maui, Hawaii, July ’97 (New York, N.Y.), W.W. K¨ uchlin, Ed., 404–409

[CR01]

M. Caboara, L. Robbiano, Families of Estimable Terms, Proceedings of ISSAC2001. London, Ontario, Canada (New York, N.Y.), B. Mourrain, Ed., 56–63

[CarR90] P. Carr`a, L. Robbiano On superG-bases, J. Pure Appl. Algebra, 68, 279–292 [KR00]

M. Kreuzer, L. Robbiano, Computational Commutative Algebra 1, SpringerVerlag (2000)

[KR03]

M. Kreuzer, L. Robbiano, Computational Commutative Algebra 2, SpringerVerlag, In preparation.

[PRW00] G. Pistone, E. Riccomagno, H.P. Wynn, Computational Commutative Algebra In Statistics, Monographs on Statistics and Applied Probability 89 Chapman&Hall/CRC (2000) [R98]

L. Robbiano, Gr¨ obner Bases and Statistics, In ”Groebner Bases and Applications” Proc. of the Conference ”33 Years of Gr¨obner Bases”, London Mathematical Society Lecture Notes Series, Vol 251, B. Buchberger and F. Winkler (eds.), Cambridge University Press, 179-204.

[R01]

L. Robbiano, Zero Dimensional Ideals or The inestimable value of Estimable Terms, Proceedings of the Academy Colloquium ”Constructive Algebra and Systems Theory”, Amsterdam November 2000, to appear.

[RR98]

L. Robbiano, M.P. Rogantin Factorial Designs and Distracted Fractions, In ”Groebner Bases and Applications” Proc. of the Conference ”33 Years of Gr¨obner Bases”, London Mathematical Society Lecture Notes Series, Vol 251, B. Buchberger and F. Winkler (eds.), Cambridge University Press, 473-482.

Suggest Documents