Sensitivity Analysis for the Decomposition of Mixed Partitioned Multivariate Models into Two Seemingly Unrelated Submodels

The paper is focused on the decomposition of mixed partitioned multivariate models into two seemingly unrelated submodels in order to obtain more efficient estimators. The multiresponses are independently normally distributed with the same covariance matrix. The partitioned multivariate model is considered either with, or without an intercept. The elimination transformation of the intercept that preserves the BLUEs of parameter matrices and the MINQUE of the variance components in multivariate models with and without an intercept is stated. Procedures on testing the decomposition of the partitioned model are presented. The properties of plug-in test statistics as functions of variance components are investigated by sensitivity analysis and insensitivity regions for the significance level are proposed. The insensitivity region is a safe region in the parameter space of the variance components where the approximation of the variance components can be used without any essential deterioration of the significance level of the plug-in test statistic. The behavior of plug-in test statistics and insensitivity regions is studied by simulations.


Introduction
A multivariate approach to modeling (see, e.g., Anderson 1958;Kshirsagar 1972;Kubáček 2008;Seber 2004) has several advantages in comparison with a series of univariate models.Specifically, multivariate models respect the association between outcomes, and thus, in general, procedures are more efficient.Further, they can evaluate the joint influence of predictors on all outcomes and avoid the issue of multiple testing.On the other hand, there are situations when the multivariate model can be decomposed to a series of simpler models, univariate or multivariate, depending on the issue.Moreover, from a practical point of view, collecting data is usually easier in decomposed models.
The paper deals with a special case of a decomposition of a partitioned multivariate model with independent multiresponses with the same covariance matrix into two seemingly unrelated multivariate submodels (Zellner 1962) in order to obtain more efficient estimators.Namely, the multiresponse variables in the model are partitioned into two sets Y 1 and Y 2 .Similarly, the set of predictors is partitioned into two sets X 1 and X 2 .As an example, let us consider the nutrigenomic study in the mouse.The response variable might be expressions of chosen genes (Y 1 i• ) and concentrations of hepatic fatty acids (Y 2 i• ) measured on subjects.The predictors might be genotype (X 1 ) and type of diet (X 2 ).The problem is to decide, roughly speaking, if it is possible to explain separately expressions of genes by genotype and hepatic fatty acids concentrations by diet or not.Fišerová and Kubáček (2012) proposed tests for the verification of the significance of a model decomposition under normality of random errors in the case when the covariance matrix is known or completely unknown.Further, Fišerová and Kubáček (2013) shown that the proposed tests may be used in models without an intercept, as well as in models with an intercept.These tasks are summarized in Section 2. Here, we will focused on the situation when the covariance matrix includes unknown variance components.
If variance components can be estimated via the maximum likelihood method, the technique of Kenward and Roger (1996) is useful for testing hypotheses about the decomposition of the model.Nevertheless, the maximum likelihood approach is suitable for replicated models or models with large number of observations.In the paper we consider a model without replications when the minimum norm quadratic unbiased estimators (MINQUE) based on Rao's procedure (Rao and Kleffe 1988) are used instead.This approach is valid even for models with small number of observations.The MINQU estimators are derived in Section 3.Estimated values of variance components can be plugged into the test statistic for a known covariance matrix.The investigation of statistical properties of a plug-in test statistic is rather difficult and therefore we will study the quality of a plug-in test statistic as a function of the variance components by sensitivity analysis.The sensitivity approach provides the so-called insensitivity regions (Kubáček 1996) in the space of variance components where the approximation of variance components do not cause any essential damage of the chosen statistical characteristic.Namely, we propose the insensitivity region for the significance level (Kubáček 2007b) as it is shown in Section 4. If we know that the true value of the variance components is with sufficiently high probability within the insensitivity region for the significance level, then the significance level of the plug-in test statistic does not exceed the chosen tolerable value.The sensitivity approach is investigated mostly in univariate models, e.g., Kubáček (1996); Fišerová and Kubáček (2003, 2004, 2006); Kubáček and Fišerová (2003); Lešanská (2002a,b).Some results for multivariate models are presented in Kubáček (2006Kubáček ( , 2007a,b) ,b) and Fišerová and Kubáček (2009).The behavior of plug-in statistics and insensitivity regions is studied by simulations in Section 5.

Tests of the decomposition in case of a known covariance matrix
Let us consider the multivariate model in a partitioned form . (1) ) is a random matrix (observation matrix), X = (X 1 , X 2 ) is a known design matrix, B 11 , B 12 , B 21 and B 22 are matrices of unknown parameters and ε = (ε 1 , ε 2 ) is a random error matrix.We will assume that the matrix X is of full column rank, the multiresponses are independent with the same positive definite covariance matrix Σ and the random errors are normally distributed.The covariance matrix Σ of the multiresponse Further, let us consider a system of two seemingly unrelated (Zellner 1962) multivariate submodels (3) with the covariance matrix Σ of the multiresponse Y i• in the form (2). Note that models in (3) are seemingly unrelated because there is a link between them described by cov(Y 1 i• , Y 2 i• ) = Σ 12 .If Σ 12 = 0, the models in (3) are independent.The problem is to decide which of the models (1) and (3) should be chosen for modeling in order to obtain more efficient estimators.
The issue with a decomposition of model ( 1) into (3) leads to testing the hypothesis that "the system of two seemingly unrelated multivariate submodels (3) is a true model", i.e., to test B 12 = 0 and B 21 = 0 simultaneously.If the covariance matrix Σ is known, Fišerová and Kubáček (2012) proposed the test statistics The symbol Tr(Σ) denotes trace of the matrix Σ and To test the hypotheses B 21 = 0 and B 12 = 0 simultaneously, one can use, e.g., the Bonferroni correction in order to preserve the type I error rate α.More precisely, if distribution, neither of the hypotheses B 21 = 0, B 12 = 0 can be rejected on the significance level α.
Note that the decomposition of model (1) leads to two seemingly unrelated submodels.If the decomposition is significant, the prediction of Y 1 conditional on X 1 is not improved also by regressing on X 2 .However the predictors X 2 are necessary for the calculation of the prediction of Y 1 .Analogous conclusions hold for the prediction of Y 2 .
Until now we have considered only the model without an intercept.A partitioned form of the model with the intercept can be written as where 1 is a vector of ones.We will assume that the design matrix (1, X 1 , X 2 ) is of full column rank, and therefore all regression parameters are unbiasedly estimable.If the model includes also an intercept, then the question is, where the intercept should go in the decomposed model, in X 1 , in X 2 , or in both X 1 and X 2 .Naturally, all cases are possible and results depend on particular tasks.To avoid this situation, Fišerová and Kubáček (2013) proposed the transformation for an elimination of the intercept that leads to the identical BLUEs of parameter matrices B 11 , B 12 , B 21 and B 22 in model ( 6) with the intercept and model without the intercept in the form Here M 1 = I − 1(1 1) −1 1 .Therefore testing the decomposition of the partitioned model with the intercept can be done similarly as for the model without the intercept.The process is the following.First we transform model ( 6) to ( 7).Next we test hypotheses B 12 = 0 and B 21 = 0 simultaneously via test statistics T 12 , T 21 using the substitution Obviously, test statistics T 12 and T 21 have the same degrees of freedom in the case of the model without the intercept since the assumptions on full column rank of the design matrices imply that the ranks of the transformed design matrices M 1 X j are equal to k j , j = 1, 2, as well.
Lemma 1 The ϑ 0 -locally MINQUE of ϑ in the model ( 1) is where the ith component of the vector γ = ( γ 1 , . . ., γ s ) is Under normality, the covariance matrix of the estimator ϑ at the point ϑ 0 is Proof.For simplicity, the proof proceeds for the univariate form of model ( 1) which can be expressed as Here, the symbol vec(Y 1 ) denotes the column vector composed of the columns of Y 1 .The notation ⊗ means the Kronecker multiplication of matrices (Rao and Mitra 1988).Then, according to Rao and Kleffe (1988), the ϑ 0 -locally MINQUE of the vector parameter ϑ in model ( 9) is ] + , where the ith component of the vector γ is given as

Now it is sufficient to take into account the equality
, which is substituted into the previous formulas, and to simplify each of the expressions. 2 The estimator of the variance components and its covariance matrix depends on approximate values ϑ 0 .To eliminate this dependency, it is necessary to use an iterative procedure.The calculated estimated values of the variance components are used in the next iteration as approximate ones.The iterative procedure is very robust.It usually stops after two iterations for any initial value of the variance components even for distributions different from the Gaussian (Bognárová, Kubáček, and Volaufová 1996).
Note that the iterated MINQUE is practically the same as the maximum likelihood estimator in the case of Gaussian distribution.Moreover, the MINQUE procedure can be used even for negative variance components and symmetric matrices V i for errors with normal distribution.The formulas for the estimators are the same as under the assumption of positive variance components and p.s.d.matrices V i .
If model ( 1) can be decomposed into (3), the variance components can be estimated on the basis of either Y 1 or Y 2 in model (3).The explicit formulas for ϑ 0 -LMINQUE in the submodels follows directly from Lemma 1. Particularly, using the notation Σ jj,0 for an approximate value of matrix Σ jj , j = 1, 2, the expressions for ϑ 0 -LMINQUE of ϑ in submodels (3) are equal to The ith component of the vector γ and the (p, q)th element of the matrix S Σ −1 jj,0 are given as If the mixed partitioned model includes also an intercept, the elimination transformation ( 7) can be used.This transformation preserves not only the BLUEs of the regression parameters matrices, but also the estimates of the variance components, as it will be shown in the following theorem.
Proof.For the sake of simplicity, the proof proceeds for the univariate form of model ( 6).
Let us denote Then the model ( 6) can be rewritten as By Rao and Kleffe (1988), the ϑ 0 -locally MINQUE of ϑ in model ( 10) is given as where γ i = Y G(V i ⊗ I)GY, i = 1, . . ., s. Further, using the relationships we obtain Thus, the matrix G can be rewritten as Therefore, the equalities also holds.Summarizing the above results we obtain ϑ γ, i.e., the ϑ 0 -LMINQUE of ϑ in model ( 10) and in the model ), the last model can be rewritten into multivariate form (7), and thus the proof is finished. 2 If the variance components are estimated, the hypothesis about the decomposition of model ( 1) can be tested by the plug-in statistics T 21 and T 12 , when the matrices Σ jj = s i=1 ϑ i V i,(jj) , where V i,(jj) is the corresponding part of V i , j = 1, 2, are plugged into formulas (4).Testing the decomposition of the model with the intercept (6) can be done by the plug-in statistics T 21 and T 12 for the transformed model ( 7), since the transformation affects neither the BLUEs of the regression parameter matrices nor the MINQUE of the variance components.
Obviously, the substitution of the true values of the variance components by their estimated values influences the optimum quality of the estimators B 21 , B 12 and, consequently, the significance level and the power of the test.The investigation of statistical properties of the plug-in test statistics T 21 and T 12 is rather difficult and therefore we will study the quality of the plug-in test statistic as a function of the variance components by sensitivity analysis as it is shown in the next section.

Sensitivity analysis for the significance level
The main idea of the sensitivity approach (Kubáček 1996) is to consider the plug-in statistic as a function of the variance components and to find a safe region in the parameter space of the variance components where the approximation of the variance components does not cause any essential damage of the significance level of the plug-in test statistic (Kubáček 2007b).The plug-in test statistic can have a higher significance level.Let ε > 0 be the maximum admissible increase of the significance level.The goal is to find a region in the parameter space of the variance components such that shifts δϑ around the true value ϑ * within this region cause the significance level of the plug-in test statistic T 21 to be not greater than α/2 + ε/2.(We consider the significance level α/2 + ε/2 since the Bonferroni correction for multiple tests on B 21 = 0 and B 12 = 0 is used.)Such a region is called an insensitivity region for the significance level and will be denoted by N ε,T 21 .More precisely, N ε,T 21 is a neighborhood of the vector ϑ * with the property The derivation of the insensitivity region for the significance level is based on an approximation of the plug-in test statistic T 21 by T 12 (ϑ) = T 21 (ϑ * ) + δT 21 .The variable δT 21 = δϑ ∂T 21 ∂ϑ characterizes the change of the statistic T 21 (ϑ * ) caused by the shift δϑ around ϑ * .Obviously, the significance level of T 21 increases with increasing δT 21 and vice versa.Hence the problem is to find the upper limit for δT 21 so that the significance level increased by a maximum tolerated value.Using the Chebyshev inequality it holds that The inequality (11) together with the probability statement for the tolerated significance level implies that for a sufficiently large t > 0 such that the inequality E(δT 21 ) + t var(δT 21 ) ≤ δ ε,T 21 , where is a sufficient condition for the upper limit for δT 21 .The explicit form of the insensitivity region N ε,T 21 is stated in Theorem 2.

Theorem 2
The insensitivity region N ε,T 21 for the significance level of the statistic T 21 is where δ ε,T 21 is given by ( 13), and t > 0 is a sufficiently large number such that the probability statement ( 12) holds.
Proof.It is necessary to determine the mean value and the variance of variable δT 21 which characterizes a change of the statistic T 21 (ϑ * ) caused by the shift δϑ.Let Analogous considerations can be made for the plug-in test statistic T 12 .In this case the explicit formula for the insensitivity region for the significance level results in The size of the insensitivity region depends on the parameters ε and t chosen by the user.
The parameter ε is related to the user's opinion that ε causes a tolerable increase of the significance level.The larger ε, the larger the insensitivity region, but also a higher significance level follows.The parameter t corresponds to the approximation of the plug-in test statistic, namely with the upper limit for the variable δT 21 that describes the change of the statistic T 21 (ϑ * ) caused by the shift δϑ.For t = 5, from the Chebyshev inequality ( 11) it follows that at least 96% of the data values of δT 21 must be within 5 standard deviations of the mean or, equivalently, no more than 4% of the data values can be more than 5 standard deviations away from the mean.If δT 21 is approximately normally distributed, at least 99.7% of the data values of δT 21 must be within 3 standard deviations of the mean.Hence it is reasonable to choose the parameter t in the interval 3, 5 .The smaller t, the larger the insensitivity region but also cases a higher tail probability.The procedure for the optimal choice of the parameter t that maximizes the size of the insensitivity region is derived in Lešanská (2002a).
Both insensitivity regions N ε,T 12 and N ε,T 21 are suitable for a justification of the utilization of plug-in joint tests T 12 , T 21 for a decomposition of model ( 1) into two seemingly unrelated submodels (3).The process is as follows.First, we determine estimates of the variance components.Then we compute the insensitivity regions N ε,T 12 and N ε,T 21 for the estimated values of the variance components and chosen values ε and t.Finally, we set the confidence domain for the variance components for a sufficiently high confidence level and check whether this confidence domain is embedded into the insensitivity regions.If this confidence domain is included into both insensitivity regions, plug-in joint tests are admissible and, moreover, the significance level of plug-in joint tests does not exceed the value of α + ε.If the confidence domain is not embedded into both insensitivity regions, the experiment requires better design, other measurement devices, or more observations to be sure that the approximation of the variance components by their estimates do not cause an increase in significance level by more than a tolerable ε.The criterion is very demanding, in some cases the confidence domain is not embedded into the insensitivity regions, however, the estimated values of the variance components lie in the insensitivity regions what implies that the increase of the significance level is almost a tolerable one (see Section 5).
The determination of an exact confidence domain for the variance components is difficult since the distribution of the estimator ϑ is unknown even for a normally distributed vector vec(Y).Some approximation can be derived using the Bonferroni inequality and the Chebyshev inequality which imply that Hence at least a (1 − α)100%-confidence domain for the variance components is a set given by the Cartesian product Recall that the covariance matrix of the variance components estimator is given by ( 8).
It can be easily proved, similarly as in Lešanská (2002b), that a k 2 -multiple of ϑ * makes a homothetic change of the boundary of the insensitivity region for the significance level with the coefficient k 2 and the centre at the point 0.

Simulation study
By simulations we will study the behavior of the plug-in test statistics T 12 , T 21 for the decomposition of model ( 1) and the insensitivity regions for the significance level.We will consider different choices of the covariance matrix, parameter matrices, number of observations and the true model (model (1) or the system of seemingly unrelated submodels (3)).
We considered n = 40 and n = 400 observations, a multiresponse Y j i• , j = 1, 2, with dimensions p 1 = 3 and p 2 = 4, and the number of regressors equal to k 1 = 2 and k 2 = 2.The parameter matrices were chosen as The design matrices were considered in the form of X = 1 10 ⊗ T and X = 1 100 ⊗ T, with the matrix The symbol 1 10 denotes the vector of 10 ones.Similar designs of experiments were used in the simulation study in Fišerová and Kubáček (2012).
Table 1: Empirical probabilities (in %) of rejecting the hypothesis "the true model is the system of two seemingly unrelated models (3)" on the significance level α.Data are simulated from model (3).
5 5.2 4.9 6.1 5.9 5.2 5.1 5.7 5.6 (14) 1 0.9 1.1 1.5 1.2 0.9 1.1 1.3 1.3 100*(14) 5 5.1 5.1 5.6 5.6 4.6 5.0 6.2 5.6 100*(14) 1 1.1 1.1 1.5 1.1 1.0 1.2 1.3 1.3 10 000 simulations were done for all cases.First, the data were simulated from the system of two seemingly unrelated submodels (3), i.e., for matrices B 1 , B 2 given by ( 14) and for zero matrices B 12 , B 21 .The empirical probabilities of rejecting the hypothesis B 12 = 0 and B 21 = 0 simultaneously are presented in Table 1.We can see that the obtained empirical significance levels for plug-in test statistics T 12 and T 21 are essentially equal to the nominal levels.Fišerová and Kubáček (2012) have shown in a simulation study that the joint test by T 12 and T 21 is conservative in case of a known covariance matrix, the obtained empirical significance level was equal to half of the nominal one.For data simulated from the model (1), i.e., for matrices B 11 , B 22 and B 12 , B 21 given by ( 14) and ( 15), respectively, the plug-in test statistics T 21 and T 12 rejected the decomposition of model ( 1) in all cases.
Finally we will investigate the insensitivity regions for the significance level.Let us assume a tolerable increase of the significance level α = 1% or α = 5% equal to ε = 5%.It means, we are satisfied if the true type I error rate is α/2 + ε/2 = 5% (3%) for a nominal significance level α = 5% (α = 1%) for each of the plug-in test statistic T 21 and T 12 .Further, we assume that the parameter t equals 3, i.e., at least 89% of data values of δT 21 and δT 12 must be within 3 standard errors of the mean.
For α = 1%, the resulting insensitivity regions are displayed together with 95%-confidence domain for the variance components in Figure 1.The data were simulated from model (3) for the covariance matrix Σ 2 and ϑ 1 = 5, ϑ 2 = 3.We can see that the insensitivity regions N 5,T 21 and N 5,T 12 are large enough.The statistic T 12 allows greater shifts in direction of ϑ 1 , and T 21 in direction of ϑ 2 .It means, the statistic T 12 is more sensitive to changes in ϑ 2 , and T 21 in ϑ 1 .Furthermore, we can notice, that the confidence domain for the variance components is embedded in both insensitivity regions for sample size n = 400 (left figure).For smaller sample size, the confidence domain increases more than the insensitivity ones (enlarged only slightly) and thus the confidence domain is not embedded into the insensitivity regions.Nevertheless, almost all variance components estimates lie within the insensitivity regions.Unfortunately, this is not generally true.The observed relative frequencies of the variance components estimates within the insensitivity regions are indicated in Table 2.The results are averages of a hundred times repeated 10 000 simulations.Obviously, the confidence region for the variance components is smaller for greater sample size and thus the relative frequencies are essentially 100%.However, for smaller sample size and significance level α = 5%, the relative frequencies are only 55-70%.Nevertheless, in this case the plug-in test is sufficiently good as it is shown in Table 1.Large differences between the relative frequencies are due to large differences in size of the insensitivity regions for significance levels α = 5% and α = 1%.For example, in the case ϑ 1 = 5 and ϑ 2 = 3, the semiaxes of insensitivity region N 5,T 21 are 1.75 and 1.28 for α = 5%, and 4.35 and 2.98 for α = 1%.This effect is related to the construction of the insensitivity regions, namely with the fact that a tolerable increase ε of the significance level α leads to larger δ ε,T 21 , δ ε,T 12 for smaller α.
Note that the insensitivity regions for the significance level are shown as ellipses in Figure 1, although, in general, they should be open sets to the right hand corner.This relates to the fact that the p-values becomes less extreme with increasing variances.However, the construction Table 2: Relative frequency (in %) of the variance components estimates within insensitivity regions for the significance level.Data are simulated from model (3) for Σ 2 .
The insensitivity regions for the significance level and the confidence domain for variance components result in intervals for the covariance matrix Σ 1 since in this case the statistic T 21 is a function of the parameter ϑ 1 only, and T 12 is a function of ϑ 2 .The obtained results are similar as in the case of the covariance matrix Σ 2 and therefore they are omitted.

Conclusion
The proposed plug-in joint test seems to be a proper method for a decomposition of a mixed multivariate model (with independent responses with the same covariance matrix) into two seemingly unrelated submodels.The decomposition is advantageous at least from two viewpoints.The estimators of the regression parameters are more efficient, and data collection can be easier.The sensitivity approach is an appropriate technique for a justification that the estimated values of the variance components can be plugged in without any essential deterioration of the regression coefficients estimates and the inference.This is based on identifying safe regions in the space of the variance components where plug-in estimators cause only negligible changes of the optimum quality of estimators and test statistics.In particular, the proposed insensitivity region for the significance level guarantees that the true type I error rate does not exceed the chosen tolerable value.
The used methodology is general and suitable for more complex (mixed) models, e.g.models with restrictions on the regression parameters, singular models, and other statistical inference such as a confidence level or power of a test as well.