Computing maximin efficient experimental designs using the methods ...

1 downloads 0 Views 104KB Size Report
(will be inserted by the editor). Computing maximin efficient experimental designs using the methods of semidefinite programming. Lenka Filová, Mária Trnovská ...
Noname manuscript No. (will be inserted by the editor)

Computing maximin efficient experimental designs using the methods of semidefinite programming Lenka Filov´a, M´aria Trnovsk´a, Radoslav Harman

Received: date / Accepted: date

Abstract In the paper, we solve the problem of computing the maximin efficient design with respect to the class of all orthogonally invariant criteria. It turns out that on a finite experimental domain, the maximin efficient design can be computed by the methods of semidefinite programming. Using this approach, we can deal with the non-differentiability inherent in the problem, due to which the standard iterative procedures cannot be applied. We illustrate the results on the models of polynomial regression on a line segment and quadratic regression on a cube. Keywords: optimal design; Ek -optimality; maximin efficiency; semidefinite programming, quadratic regression, polynomial regression

1 Introduction Consider the linear regression model y(x) = f T (x)θ+ε on the finite experimental domain X = {x1 , . . . , xn }, where y(x), x ∈ X, are the observations, f : X → Rm is a vector of known regression functions, θ ∈ Rm is a vector of unknown parameters, and ε is a random error, E(ε) = 0 and D(ε) = 1, without loss of generality. The errors are assumed to be uncorrelated for different observations. For the models with uncorrelated errors, an experimental design is usually defined as a probability measure ξ on X, see, e.g., [10] or [11]. On a finite experimantal domain X, any design ξ is completely determined by a vector w ∈ Rn of weights of points x1 , ..., xn in the sense that wi = ξ(xi ). Therefore, for the purpose of this paper, a design is a vector from the unit n-dimensional simplex Sn = {w ∈ Rn : w1 + · · · + wn = 1, w1 , . . . , wn ≥ 0}, with the interpretation that wi gives the proportion of trials to be performed P in xi for all i = 1, ..., n. The n performance of a design w ∈ Sn is based on the information matrix M(w) = i=1 wi f (xi )f T (xi ).

This research was supported by ERDF-017/2009/4.1/OPVaV-CESIUK project and the Slovak VEGA-Grant No.1/0077/09. Department of Applied Mathematics and Statistics Faculty of mathematics, physics and informatics Comenius University Bratislava E-mail: [email protected] (corresponding author)

2

We will consider a generalization of the standard design problem, where the information matrix is assumed to be a linear combination of general positive semidefinite matrices, not only those of rank 1. Thus, m let H1 , . . . , Hn ∈ S+ be the “elementary information matrices” of type m × m, and let their convex hull H contain a regular matrix. We will define the information matrix of a design w ∈ Sn by M(w) =

n X

wi Hi .

i=1

This generalization can be used in a variety of less standard optimal experiment design problems and at the same time possesses properties similar to the original problem (see, e.g., [6], [16]). In some cases, this generalization significantly reduces the size of the problem of maximin efficient design, which is exemplified on the model of quadratic regression on a cube in Section 4. Note that for the standard case the elementray information matrices Hi = f (xi )f T (xi ), i = 1, ..., n, correspond to the information matrices of the trials in individual design points. m Optimality criterion Φ is a real-valued function defined on the set S+ of all positive semidefinite matrices of type m × m, measuring the “size” of the information matrix. A design w∗ is Φ-optimal, if it maximizes the value of the criterion, i.e., if Φ(M(w∗ )) ≥ Φ(M(w)) for all w ∈ Sn . The matrix M(w∗ ) is then called a Φ-optimal information matrix and Φ(M(w∗ )) is the Φ-optimal value. An optimality criterion is called an information function if it is nonzero, Loewner isotonic, concave, positive homogeneous and upper semicontinous (see [11], Chapter 5). Essentially any reasonable positive homogeneous optimality criterion that measures the amount of information gained from the experiment (rather than a “loss” from the experiment) is an information function. Moreover, the properties of the information functions guarantee existence of the optimal design. The most frequently used information functions are the criteria of D-, E-, and A-optimality. For an information matrix M, the criterion of D-optimality is defined as ΦD (M) = (det(M))1/m , the criterion of E-optimality as ΦE (M) = λ1 (M), where λ1 denotes the minimum eigenvalue, and the criterion of A-optimality is defined as ΦA (M) = m(tr(M−1 ))−1 for a regular M and ΦA (M) = 0 for a singular M. In the paper, we will focus on the class O of all orthogonally invariant information criteria defined as the set of information functions Φ that satisfy the condition of orthogonal invariance, i.e., Φ(UCUT ) = Φ(C) m for all C ∈ S+ and orthogonal matrices U of type m × m. It is simple to show that O is the class of all those information functions that depend only on the eigenvalues of the information matrix, such as the criteria of D-, E-, and A-optimality. The quality of a design w compared to the Φ-optimal design w∗ is measured by its Φ-efficiency ([11], p. 132): Φ(M(w)) eff(w | Φ) = . (1) Φ(M(w∗ )) In the paper we will be interested in finding the most performance-stable design under an arbitrary selection of an orthogonally invariant criterion, the so-called O-maximin efficient design. For this purpose, we firstly need to define criteria based on partial sums of eigenvalues, introduced in [5]: let λ1 (M), . . . , λm (M) be the eigenvalues of the information matrix M arranged in nondecreasing order, i.e. 0 ≤ λ1 (M) ≤ λ2 (M) ≤ . . . ≤ λm (M). For k ∈ {1, ..., m} the optimality criterion ΦEk is defined as the sum of first k smallest eigenvalues of the information matrix: m ΦEk : S+ → [0, ∞),

ΦEk (M) =

k X i=1

λi (M).

3

As a special case we obtain the criterion of E-optimality (for k = 1) and the criterion of T -optimality (for k = m; see [11], Section 6.5). Compared to standard optimality criteria such as ΦD , the analytic properties of the criteria ΦEk are relatively complicated: they are not strictly concave (i.e., we can have more than one ΦEk -optimal information matrix), nonzero on singular information matrices (i.e., there can be a singular optimal information matrix, except for the case k = 1), and not everywhere differentiable (except for the case k = m). In [5], the problem of calculating the minimal efficiency of w with respect to the class O is studied. The main result of the cited paper is that the minimal efficiency with respect to the class O is equal to the efficiency with respect to the class of ΦEk -optimality criteria. The result is stated in the following theorem (Theorem 7 of [5]): Theorem 1 Let w ∈ Sn . Then inf eff(w | Φ) =

Φ∈O

min

k=1,...,m

eff(w | ΦEk ).

(2)

As a consequence we get that for calculating the minimal efficiency of the design w with respect to the whole class O we only need to find the minimal efficiencies of w with respect to the criteria ΦEk , k = 1, . . . , m. Therefore, our first aim is to find a vector wk∗ that solves the optimization problem v(k) = max ΦEk (M(w)).

(3)

w∈Sn

We will call it the EkH -optimal design problem. Knowing the ΦEk -optimal values v(k), k = 1, 2, . . . , m, we can find the design which maximizes the minimal efficiency in the class O, i.e., the design which is optimal according to the criterion m ΦO : S+ → [0, ∞),

ΦO (M) =

min

k=1,...,m

ΦEk (M) . v(k)

(4)

We will call it the O-maximin efficient design, or, briefly, the maximin efficient design. For a design w ∈ Sn , the value ΦO (M(w)) will be called the O-minimal efficiency of w. Note that the idea of a maximin efficient design has been introduced in the papers [3] and [8]. Since the functions ΦEk have in general complicated analytic properties, so does the function ΦO . Therefore, to calculate the O-maximin efficient design, the standard numerical procedures are difficult to apply. One possible approach is to use a differentiable approximation of the criteria (see [2]). A more efficient solution is based on the methods of semidefinite programming. Using the results of Alizadeh ([1]), we will show that the problems of Ek -optimality, as well as the problem of finding the maximin efficient design, can be formulated as a semidefinite programming problem - a class of optimization problems which can be efficiently solved using the so-called interior point methods. The rest of the paper is structured as follows. In Section 2, we will formulate the problem of finding the Ek -optimal design as a semidefinite program. In Section 3, we will show that we can use the methods of semidefinite programming for finding the maximin efficient designs. Finally, in Section 4, we will use the SeDuMi toolbox (see [13]) for MATLAB for solving semidefinite programs to illustrate the proposed method on the models of polynomial regression on a line segment and quadratic regression on a q-dimensional cube.

4

2 Semidefinite formulation of the EkH -optimal design problem By S m , denote the vector space of all m × m symmetric matrices. By linear matrix inequality we understand an inequality of the type PN A0 + i=1 yi Ai  0,

where y ∈ RN , Ai ∈ S m , i = 0, 1, . . . , N and  is the Loewner partial ordering defined on S m as A  m B ⇔ A − B ∈ S+ . In semidefinite programming (SDP) (see, e.g., [14] for broader treatment of the topic), we optimize a linear function subject to linear and linear matrix inequality constraints. Semidefinite programming is a special subclass of convex mathematical programming and unifies several standard optimization problems, such as linear or quadratic programming. SDP finds many applications in various areas – combinatorial optimization, system and control theory, mechanical and electrical engineering or statistics. It is known that the problems of A-optimal design and E-optimal design can be formulated as SDP problems (see [15]). It has been shown by Alizadeh [1] that the sum of the first k largest eigenvalues of an affine combination of symmetric matrices can be formulated as an SDP problem. It follows from the Theorem 4.3 of [1]: Theorem 2 Let λ↓1 (A), . . . , λ↓m (A) be the eigenvalues of a symmetric matrix A arranged in nonincreasing order, i.e. λ↓1 (A) ≥ λ↓2 (A) ≥ . . . ≥ λ↓m (A). Then the sum λ↓1 (A) + · · · + λ↓k (A) of the first k largest eigenvalues of A can be characterized as an optimal value of the following SDP problem: min kz + trace V s.t. V + zI  A, , V  0. with variables z ∈ R, V ∈ S m . Note that the problems of maximizing the sum of the first k smallest eigenvalues of a symmetric matrix and minimizing the sum of the first k largest eigenvalues of a symmetric matrix are equivalent. Therefore, the EkH optimality problem (3) can be formulated as a SDP problem in the following way: max kz − traceP V s.t. V − zI + ni=1 wi Hi  0, V  0, w ∈ Sn

(5)

where z ∈ R, V ∈ S m , w ∈ Rn are the variables. By solving the optimization problems (5) for k = 1, 2, . . . , m we get values v(1), v(2), . . . , v(m) and therefore we can obtain the ΦEk -efficiencies of any design w ∈ Sn . Our next goal is to compute the Omaximin efficient design.

3 The semidefinite programming characterization of maximin efficiency In the previous section we have shown that the Ek -optimality problem can be formulated as a semidefinite program. Our goal now is to find the maximin efficient design, i.e. to solve the following problem: max

min

k∈{1,...,m}

s.t. w ∈ Sn .

ΦEk (M(w)) v(k)

(6)

5

It can be easily seen that this problem is equivalent to max t s.t. ΦEk (M(w)) ≥ v(k)t, k = 1, 2, . . . m, w ∈ Sn .

(7)

From the characterization given in Theorem 2 it follows that ΦEk (M(w)) = ΦEk

n X

wi Hi

i=1

!

≥ v(k)t

m if and only if there exists zk ∈ R and Vk ∈ S+ such that

kzk − trace(Vk ) ≥ v(k)t,

Vk +

n X

wi Hi − zk I  0.

i=1

Therefore, the problem (6) can be formulated as a semidefinite programming problem in the following way: max t s.t. kzk − P trace(Vk ) ≥ v(k)t, k = 1, 2, . . . , m, Vk + ni=1 wi Hi − zk I  0, k = 1, 2, . . . , m, Vk  0, k = 1, 2, . . . , m, w ∈ Sn ,

(8)

with the variables t ∈ R, V1 , . . . , Vm ∈ S m , z = (z1 , . . . , zm )T ∈ Rm and w = (w1 , . . . , wn )T ∈ Rn . Thus, we have rewritten the problem (6) in terms of semidefinite programming and therefore we can use the SeDuMi toolbox to compute the maximin efficient designs.

4 Examples 4.1 Polynomial model on a discretization of [−1, 1] In this subsection, we will apply our results to the polynomial model on the experimental domain X = {−1, −0.99, . . . , 0.99, 1}. The polynomial model for the measurement in the point x is defined as y(x) = θ0 + θ1 x + θ2 x2 + . . . + θd xd + ε, where d is the degree of the model and θ0 , . . . , θd are the parameters of the model. Hence the number of parameters is m = d+1 and the d+1 dimensional vector of regression functions is f (x) = (1, x, x2 , . . . , xd )T . In this example, we will use the (standard) Ek -criteria – the elementary design matrices being Hi = f (xi )f T (xi ), where xi ∈ X, i = 1, . . . , n. We will analyze this model for degrees d = 2, . . . , 6 with the aim to arrive at the O-maximin efficient designs. Note that for higher degrees of the polynomial, the algorithm becomes numerically unstable due to very high dimensions of the resulting matrices in the SDP formulation (8). Firstly, we need to find the optimal values for the criteria of Ek -optimality. Using the methods developed in Section 2 and the SeDuMi toolbox for MATLAB, we have obtained the ΦEk -optimal values v(k) = v d (k) for degrees d = 2, . . . , 6 and for k = 1, . . . , d+ 1 (see Table 1). Having these values we were able to compute

6 k/d 1 2 3 4 5 6 7

2 2.000 ×10−1 1 3

3 4.000×10−2 2.000 ×10−1 2 4

4 7.7520 ×10−3 4.000×10−2 3.3333×10−1 2 5

5 1.4683×10−3 7.7518×10−3 8.4876×10−2 3.3333×10−1 3 6

6 2.7374×10−4 1.4684×10−3 1.8619×10−2 8.4877×10−2 4.3050 ×10−1 3 7

Table 1 ΦEk -optimal values v(k) = vd (k) for the degrees d = 2, . . . , 6 of polynomial regression and for k = 1, . . . , d + 1. d=2 d=3 d=4 d=5 d=6

points weights points weights points weights points weights points weights

-1 0.3234 -1 0.2799 -1 0.2458 -1 0.3114 -1 0.2979

0 0.3532 -0.45 0.2200 -0.65 0.1687 -0.55 0.0135 -0.65 0.0120

1 0.3234 0.45 0.2200 0 0.1711 -0.2 0.1750 -0.41 0.1718

0.7646 1 0.2799 0.65 0.1687 0.2 0.1750 0 0.0364

0.6628 1 0.2458 0.55 0.0135 0.41 0.1718

0.6403 1 0.3114 0.65 0.0120

0.6514 1 0.2979

0.6500

Table 2 The maximin efficient designs for polynomial regression of degrees d = 2, . . . , 6. The last column of the table denotes the minimal efficiency of the maximin efficient design.

the maximin efficient designs (4), using the semidefinite programming characterization (6) from Section 3 (see Table 2). Note that the values exhibited in Table 1 fall into the intervals for the Ek -optimal values obtained in [2]. The values for d = 2, 3, 4 are in accordance with analytically obtained results from [5]. For the degree d = 2, the maximin efficient design shown in Table 2 has been obtained analytically in [5], and it is in accordance with our results. Notice also that the E- and A-optimal designs have much lower O-minimal efficiency than the maximin efficient design, but the performance of the D-optimal design is better (see Figure 1). The construction of Doptimal design is based on chapter 9 in [11]. The construction of E-optimal design follows from the theorems in [12] and [11], part 9.13.

4.2 Quadratic regression on q-dimensional cube As a second example, consider the multiple quadratic regression on the q-dimensional cube X = [−1, 1]q , given by the formula: y(x) = β0 +

q X i=1

βi x2i +

q X i=1

β (i) xi +

X

βij xi xj + ε.

(9)

i

Suggest Documents