Stochastic Design Criteria in Linear Models

: Within the framework of classical linear regression model stochastic optimal design criteria are considered. As examples a line ﬁt model and a k -way linear ﬁt model are taken. If the optimal design does not exist, an approach consisting in choosing the efﬁcient design is suggested.


Introduction
A literature on optimal design criteria is very extensive.For references see Pukelsheim (1993) and Liski et al. (2002), for example.Among the criteria there are the so-called classical criteria like A-, D-or E-optimality as well as the relatively new ones like stochastic optimality criteria.The stochastic criteria have gained a momentum only recently, though the most known criterion of such type was put forward more than thirty years ago (see Sinha, 1970).
In the paper we consider the classical linear regression model where the n × 1 response vector Y = (Y 1 , Y 2 , . . ., Y n ) T follows a multivariate normal distribution, X = (x (1) , x (2) , . . ., x (n) ) T is the n × k design matrix of the full rank k ≤ n, β = (β 1 , β 2 , . . ., β k ) T is the k ×1 parameter vector, E(Y ) = Xβ is the expectation vector of Y and σ 2 I n is the covariance matrix of Y, where I n is the n × n identity matrix and σ > 0 is unknown.
Let β be the least squares estimator of β being at the same time the best linear unbiased estimator.As it is well-known, β = (X T X) −1 X T Y ∼ N k (β, σ 2 (X T X) −1 ).
With a design ξ we associate its k × k moment matrix M (ξ) = m i=1 p i x (i) x (i)T .If p i = n i /n, i = 1, 2, . . ., m, m ≤ n, where n i are integers and m i=1 n i = n, then the covariance matrix of β is (σ 2 /n)M −1 (ξ).Throughout the paper, we write β = β(ξ) or β = β(M ) to emphasize the dependence of β from the design ξ or from the moment matrix M, respectively.
Austrian Journal of Statistics, Vol. 34 (2005), No. 2, 211-223 In the paper we refer to a line fit model when we have n ≥ 2 uncorrelated responses i = 1, 2, . . ., m; j = 1, 2, . . ., n i (2) with expectations E(Y ij ) = β 1 + β 2 z i and variances V (Y ij ) = σ 2 .In this case an approximate design ξ specifies distinct values z 1 , . . ., z m chosen from a given experimental domain (usually an interval [a, b]) and assigns to them weights p 1 > 0, . . ., p m > 0, respectively.Of course, these weights satisfy m i=1 p i = 1.Here, In the paper we also consider a k-way line fit model with or without an intercept.In the first case we have n ≥ k + 1 uncorrelated responses with unknown parameters (β 0 , β 1 , . . ., β k ) and experimental conditions x (i) = (x with unknown parameters (β 1 , β 2 , . . ., β k ) and experimental conditions In both cases the assumptions on e ij are the same as in (2).The paper is organized as follows.Classical and stochastic optimality criteria are discussed in Section 2. There the problem of establishing the corresponding optimal designs in the above-mentioned models is also considered.Section 3 is devoted to the concept of efficient designs in the situations where the optimal designs do not exist.The proofs of theorems can be found in Appendix.

Optimality Criteria
An optimality criterion F is a function from the closed cone of nonnegative definite matrices into R 1 + .We say that the design ξ * is F -optimal if and only if Recall the definitions of the classical optimality criteria.
• The criterion Here, λ max (M −1 ) denotes the maximal eigenvalue of the matrix M −1 .It is well-known that det (M −1 ), called the generalized variance, determines the volume of the ellipsoid of concentration for β.Its minimization leads to the ellipsoid of concentration with the smallest volume.On the other hand, minimization of tr (M −1 ), called the average variance, is the minimization of the sum of variances of β i , i = 1, 2, . . ., k.At last, minimization of λ max (M −1 ) leads to the ellipsoid of concentration having the smallest length of the maximal axis.In all the three cases we set F (M ) = ∞ if the matrix M is degenerate.This means that the designs with degenerate moment matrices can be excluded from the consideration.Therefore, we can assume that the support size of designs satisfies the inequality m ≥ k (or m ≥ k + 1 when we deal with a k-way line fit model with an intercept).Saying 'stochastic optimality criteria' we mean functions depending on the moment matrices through a probability.A typical example is the family of criteria where A is a given class of bounded subsets of R k containing the origin.The class A can be interpreted as a collection of sets determining a system of neighbourhoods of the origin.
Here, we would like to choose a design which guarantees the maximal probability for the estimator β of being 'close' to β.Of course, the terminology 'stochastic' is rather relative.
It is due to Sinha (1970) who introduced the concept of the distance stochastic (DS) criterion in certain treatment design settings.Liski et al. (1999) studied the properties of this criterion under the classical linear regression model (1).In case of a degenerate matrix M, it is natural to set P ([ We obtain the DS-criterion taking A to be the class of all k-dimensional balls centered at the origin: where • stands for the usual Euclidean norm in R k .Here, we assume the system of neighbourhoods to be the balls, one of the most natural choice.Observe that the DScriterion, in fact, is a family of DS(ε)-criteria indexed by ε > 0. We say that the design is optimal with respect to the family of criteria if and only if it is optimal with respect to the each criterion from this family.One can remember the families of criteria popular in the literature: the φ p -criterion (see Pukelsheim, 1993, Chapter 6) or the characteristic criterion (see Rodrigues-Diaz and López-Fidalgo, 2003).It should be emphasized, however, that the DS(ε)-optimal design, i.e. the design which is optimal with respect to the DS(ε)-criterion for given ε > 0, is not of great interest itself since usually it depends on unknown σ.Liski et al. (1999, Theorem 5.1) studied the behavior of the DS(ε)-criterion when ε approaches 0 and ∞.These limiting cases have an interesting relationship with the classical D-and E-optimality criteria.It turns out that the DS(ε)-criterion is equivalent to the D-criterion as ε → 0 and to the E-criterion as ε → ∞.Moreover, minimization of the probability P ( β(ξ) − β > ε) simultaneously for all ε > 0 is equivalent to minimization of Eg( β(ξ) − β ) for all increasing functions g such that the expectation exists (see Marshall and Olkin, 1979, Chapter 17.A).In particular, one can take g(x) = x.Therefore, if a design is DS-optimal then it is also A-optimal.Zaigraev (2002) suggested a natural extension of the DS-criterion called the shape stochastic (SS ρ ) criterion.Here, and ρ is a positive continuous function defined on the unit sphere S k−1 in R k .In particular, the SS ρ -criterion is simply the DS-criterion if ρ(•) ≡ 1. Again one can note that the SS ρ (ε)-optimal design, in general, depends on unknown σ.
In the sequel, we confine ourselves to the case where A ρ is a convex and symmetric (with respect to the origin) set.This restriction has the following sense.We say that a design , where M 1 and M 2 are the moment matrices of the designs ξ 1 and ξ 2 , respectively.Thus the Loewner partial ordering among moment matrices induces a partial ordering among associated designs.Observe that D-, A-, E-and DS-criterion are antitonic relative to Loewner ordering, that is for any two moment matrices M 1 and M 2 the inequality Such a property is a desirable if we deal with an optimality criterion.As it follows from Theorem 2 of Liski and Zaigraev (2001), the SS ρ -criterion is antitonic relative to Loewner ordering if and only if the set A ρ is convex and symmetric with respect to the origin.
Zaigraev (2002, Theorems 1 and 2) established the limit behavior of the SS ρ -criterion when ε approaches 0 and ∞.It turns out that the SS ρ (ε)-criterion is equivalent to the Dcriterion as ε → 0 and, under the mild regularity conditions on ρ, to the minimax criterion Here, ∂A ρ is the boundary of the set A ρ .

Optimal Designs for a Line Fit Model
Consider model (2) with z i ∈ [a, b], i = 1, 2, . . ., m. Searching for optimal designs, in this situation it is enough to consider only two-point designs of the form ξ p = {a, b; p, 1− p}, 0 < p < 1 (see e.g. de la Garza, 1954, Liski andZaigraev, 2001, Lemma 1).That is, the support consists of the extreme points of the experimental domain.The moment matrix of such a design has the form It is not difficult to calculate the optimal designs with respect to the classical optimality criteria.They are given as follows: • D-optimal design ξ D = ξ 0.5 ; Observe that if a = −b (symmetric experimental domain), then the design {−b, b; 0.5, 0.5} is D-, A-and E-optimal.As to DS-optimality, the following general result holds.
Theorem 1.The design {−b, b; 0.5, 0.5} is optimal for model ( 2) on the symmetric experimental domain [−b, b] with respect to any stochastic criterion of the form ( 5) with A to be a class of convex and symmetric (with respect to the axes) sets in R 2 .
Theorem 1 is a direct extension of Lemma 2 of Liski and Zaigraev (2001).Its proof, in fact, repeats the proof of that lemma modifying it slightly accordingly with the fact that in the situation considered we have where Observe that if a = −b (asymmetric experimental domain), then D-, A-and E-optimal designs are different and, therefore, DS-optimal design does not exist.However, it is of interest to note that sometimes in such a situation SS ρ -optimal designs exist (see Zaigraev, 2002, Section 3).
Dealing with the SS ρ -criterion in model (2), we confine ourselves to the following classes of sets centered at the origin: The choice between those cases can be made basing on the problems to be solved or our personal preferences.In the first two cases, by Theorem 1, the SS ρ -optimal design on the experimental domain [−b, b] exists; it is {−b, b; 0.5, 0.5}.In the last case, however, the SS ρ -optimal design does not exist due to lack of symmetry with respect to the axes.Now, consider the case of an asymmetric experimental domain.For definiteness, we take [a, b] = [0, 1].In accordance with the above-mentioned calculations, ξ D = {0, 1; 0.5, 0.5}; The DS-optimal design does not exist; for given ε > 0 the DS(ε)-optimal design depends on ε.Denote by C the class of all DS(ε)-optimal designs and add to C two limiting designs when ε → 0 and ε → ∞.The same notation concerns the SS ρ -criterion as well.
As we have mentioned earlier, for model (2) on the experimental domain [0, 1] it is enough to consider only two-point designs of the form {0, 1; p, 1 − p}, 0 < p < 1. Below we give numerical results of calculating C in four cases: for the DS-criterion and for the SS ρ -criterion (squares, rectangulars and ellipses).For given ε > 0 the weight p for the corresponding optimal design depends on ε.However, it is more comfortable to use δ = √ nε/σ instead of ε to express this dependence.Observe that limiting optimal designs when ε → ∞ coincide with those calculated theoretically (see Zaigraev, 2002, Section 3).

Optimal Designs for a k-way Line Fit Model
Consider model (3), that is a k-way line fit model with an intercept.Here, Pukelsheim, 1993, Section 15.5, Liski et al., 1999, Liski and Zaigraev, 2001).Recall that for model (3) the smallest possible support size of a feasible design is m = k + 1.
As it is shown in Theorem 4.2 of Liski et al. (1999), if r = √ k, then the DS-optimal design with the support size m = k + 1 on Z √ k exists.This is a so-called regular simplex design (see also Pukelsheim, 1993, Section 15.12).Such a design has the weights 1/(k + 1) and the support vectors z (1) , . . ., z (k+1) , which belong to the boundary of the experimental domain and form a regular simplex, that is z (1) = √ k, . . ., z (k+1) = √ k, and z (i)T z (j) = −1 for all i = j ≤ k + 1. Observe that a regular simplex design has the identity moment matrix and an orthogonal transformation of a regular simplex design is again a regular simplex design.Now, we extend this result to the case of arbitrary r > 0. Theorem 2. A design ξ with the support size m = k + 1 for model ( 3) on the experimental domain Z r is DS-optimal if and only if it is a regular simplex design, that is the design having the weights 1/(k + 1) and the support vectors z (1) , . . ., z (k+1) satisfying z (1) = r, . . ., z (k+1) = r, and z (i)T z (j) = −r 2 /k for all i = j ≤ k + 1.The moment matrix of such a design has the form M = diag (1, r 2 /k, . . ., r 2 /k).
It should be emphasized that the result of Theorem 2 extends also the corresponding result for model (2) on the experimental domain [−b, b] (cf.Theorem 1).Now, we consider model ( 4), that is a k-way line fit model without an intercept.As experimental domains, we take the sets Theorem 3. DS-optimal designs with the smallest possible support size m = k for model ( 4) on the experimental domains X 1 , X 2 and X 3 exist and coincide.This is a so-called orthonormal design, that is the design having the weights 1/k and the orthonormal support vectors x (1) , . . ., x (k) satisfying x (1) = r, . . ., x (k) = r and x (i)T x (j) = 0 for all i = j ≤ k.The moment matrix of such a design has the form M = (r 2 /k)I k .
The natural approach consists in choosing an efficient design (see e.g.Li andChan, 2002, Rodrigues-Diaz andLópez-Fidalgo, 2003), that is the design which is optimal for a certain DS(ε * )-criterion and at the same time performs well under other DS(ε)-criteria.

Appendix
Proof of Theorem 2. We begin with two auxiliary results.Lemma 1.Let Z be an (k + 1) × k matrix of the full rank k and e = 0 be a given vector in R k+1 .Then under the condition Z T e = 0, for any given c > 0.
Proof.(=⇒) Assume that Z T e = 0 and ZZ T = c[I k+1 − ee T / e 2 ].Multiplying the last equality by Z T on the left and by Z on the right, we get Therefore, also But the last equality is equivalent to e 2 ZZ T + cee T = c e 2 I k+1 .The lemma is proved.Now, we establish the D-optimal design.Lemma 2. A design ξ with the support size m = k + 1 for model (3) on the experimental domain Z r is D-optimal if and only if it is a regular simplex design.
where P = diag (p 1 , p 2 , . . ., p k+1 ).Observe that and the equality holds if and only if p 1 = p 2 = . . .= p k+1 = 1/(k + 1).Therefore, This means that searching for the D-optimal design it is enough to consider only designs with equal weights.By the properties of determinant (see Rao, 1973, p. 32), where Z = (z (1) , z (2) , . . ., z (k+1) ) T .The upper bound is attained if and only if k+1 i=1 z (i) = 0. On the other hand, by the properties of trace we have and the upper bound is attained if and only if z (i) = r, i = 1, 2, . . ., k + 1.Clearly, under the constraint tr (Z Summing up, one can note that the upper bound is attained here if and only if M (ξ) = diag (1, r 2 /k, . . ., r 2 /k).It remains to show that only a regular simplex design has such a moment matrix.
Assume, first, that r 2 ≤ k.By Theorem 3.2 of Liski et al. (1999), in order to prove the theorem it is enough to show that for any given design ξ the eigenvalues λ 1 (M (ξ)), λ 2 (M (ξ)), . . ., λ k+1 (M (ξ)) of the moment matrix M (ξ) given by ( 6), arranged in the descending order of magnitudes, satisfy the inequalities: The last inequality is evident since 6).By the Sturm separation theorem (see Rao, 1973, p. 64) we have the sequence of inequalities: Therefore, in order to prove the inequalities in (8) it is enough to show that We have tr (B) = k+1 i=1 p i z (i) 2 ≤ r 2 .If tr (B) = r 2 , then (9) holds with the equality in the last inequality (see Marshall and Olkin, 1979, Section 1.A).If tr (B) < r 2 , then (9) holds all the more.
Proof of Theorem 3. First, observe that X 2 ⊂ X 1 , X 3 ⊂ X 1 and both X 2 and X 3 contain all the orthonormal designs.Therefore, it is enough to prove the theorem only for the experimental domain X 1 .
We start with proving that an orthonormal design is D-optimal.As in the proof of Theorem 2, it is easy to show that we can confine ourselves to the case of designs with equal weights.Such a design ξ has the moment matrix M (ξ) = (1/k)X T X.From the Hadamard inequality (Rao, 1973, p. 56) it follows that Equality holds if and only if XX T = r 2 I k , that is if and only if the design ξ is orthonormal.Thus, an orthonormal design is D-optimal on X 1 .
In order to show that an orthonormal design is DS-optimal, it remains to prove that for any given design ξ the following inequalities The theorem is proved.
ε>0 ν(p, ε).The design ξ p * ∈ C such that p * = Arg max p min ε>0 ν(p, ε) Austrian Journal of Statistics, Vol.34 (2005), No. 2, 211-223 is called the efficient design.Let us keep the same considerations and notations for the SS ρ -criterion as well.The numerical calculations give the following results.For the DS-criterionp * = Arg max