The Multivariate Multisample Nonparametric Rank Statistics for the Location Alternatives

Multisample testing problems are among the most important topics in nonparametric statistics. Various nonparametric tests have been proposed for multisample testing problems involving location parameters, and the analysis of multivariate data is important in many scientific fields. One type of multivariate multisample testing problem based on Jurečková-Kalina-type rank of distance is discussed in this paper. A multivariate KruskalWallis-type statistic is proposed for testing the location parameter with both equal and unequal sample sizes. Simulations are used to compare the power of proposed nonparametric statistics with the Wilks’ λ, the Pillai’s trace and the Lawley-Hotelling trace for various population distributions.


Introduction
Testing hypotheses is one of the most important challenges in nonparametric statistics.Various nonparametric tests have been proposed for one-sample, two-sample and multisample testing problems involving the location, scale, location-scale and other parameters.Recent progress in computerized measurement technology has permitted the accumulation of multivariate data, increasing the importance of multivariate data in many scientific fields.When we consider testing a multivariate multisample hypothesis, one of the most important statistical procedures, we naturally consider vector-valued observations.If only a marginal study of each component of these vectors is carried out, then outliers, strongly influential points and useful relationships among variables may not be detected.Thus, a multivariate examination of the data is necessary.However, in many applications, the underlying distribution is not adequately understood to assume normality or any other specific distribution, and the nonparametric test statistic must be used.Because it is important to determine how to represent ranks for multivariate data in nonparametric statistics, various researchers have proposed the distances of observation for the rank tests.Jurečková and Kalina (2012) proposed a rank test based on observation distances for two-sample problems with a discussion about the unbiasedness of test statistics under the alternatives hypothesis.
Bradley, Lepage and Baumgartner statistics.In addition, Murakami (2015b) considered the use of Jurečková-Kalina-type rank of distance with the Wilcoxon-type statistic.We extend this concept of rank of distance to a multisample setting.In Section 2, we introduce multivariate multisample nonparametric statistics based on Jurečková-Kalina-type rank of distance.We consider the Kruskal-Wallis test (Gibbons and Chakraborti 2010), the multisample median test (Hájek et al. 1999), the multisample Lepage-type test (Rublík 2007), the Wilks' λ (Rencher 1998), the Pillai's trace (Rencher 1998) and the Lawley-Hotelling trace (Rencher 1998) in this paper.In addition, we propose another type of multivariate Kruskal-Wallis test.In Section 3, we compare the powers of the proposed test with the multivariate multisample parametric and nonparametric tests for various distributions by using simulation studies.The simulations include 100,000 Monte Carlo replications.Conclusions are stated in Section 4.

Multivariate multisample nonparametric statistics
In this section, we introduce the multivariate multisample nonparametric statistics for the vector-valued observations.MANOVA is one of the most important types of statistical procedures in many scientific fields, especially in biometry.However, in many applications, the underlying distribution is not adequately understood to assume normality or some other specific distribution.Additionally, if we carry out only a marginal component of the vector-valued observation, we may not detect outliers, strongly influential points and useful relationships among variables.Then, we require to determine how to represent ranks for the vector-valued observation.
Let {x ij ; i = 1, . . ., k, j = 1, . . ., n i } be k independent samples from p-variate populations having continuous unknown distribution functions F (p) i .Under these circumstances, we are interested in the following hypothesis: To test this hypothesis, we utilize the multivariate multisample nonparametric statistics.For multivariate data in nonparametric statistics, it is important to determine how to represent a rank of the vectored-value observation.Jurečková and Kalina (2012) proposed a distance of observation for k = 2, and the proposed rank of distance was found to be invariant for a shifted location parameter.To introduce their rank of distance, let denote the pooled sample.For every fixed j and under fixed x 1j , 1 ≤ j ≤ n 1 , they considered the distances { * jt = L(x 1j ,ζ t ); t = 1, . . ., n 1 + n 2 , j = t}, where L(•, •) denotes Euclidean distance.Then, conditionally given x 1j , the vector Jurečková and Kalina (2012) decided to work with the ranks of * jt , t = 1, . . ., n 1 + n 2 , j = t.Herein, we extend this concept of rank of distance to a multisample setting.Let Z = (Z 1 , . . ., Z N ) = (x 11 , . . ., x 1n 1 , x 21 , . . ., x 2n 2 , . . ., x k1 , . . ., x kn k ) denote the pooled sample with N = n 1 + • • • + n k .We consider Jurečková-Kalina-type of distances such that { st = L(Z s ,Z t ); s, t = 1, . . ., N, s = t} for every fixed s.Conditionally given as x ij , the vector { u(i,j)v(i) ; u(i, j) = i−1 q=1 n q + j,v(i) = i−1 q=1 n q + r, r = 1, . . ., n i , r = j} is then a random sample based on the distribution function i , where we define n q = 0 for i = 1.Assuming that the distribution functions G (p) i are continuous, the rank of st is denoted as where s, t = 1, . . ., N and t = s for fixed s.We then consider the following rank statistics: • The Kruskal-Wallis statistic T 1 (Gibbons and Chakraborti 2010): where Exact probabilities for the Kruskal-Wallis statistic are listed in Gibbons and Chakraborti (2010) for small sample sizes.The limiting distribution for the Kruskal-Wallis statistic is the chi-square distribution with k − 1 degrees of freedom.
• The multisample median statistic T 2 (Hájek et al. 1999): where Exact probabilities for the multisample median statistic are listed in Jorn and Klotz (2002) for small sample sizes.The limiting distribution for the multisample median statistic is the chi-square distribution with k − 1 degrees of freedom.
• The multisample Lepage-type statistic T 3 (Rublík 2007): where Exact probabilities for the multisample Lepage-type statistic are listed in Murakami (2008) for small sample sizes.The limiting distribution for the multisample Lepagetype statistic is the chi-square distribution with 2(k − 1) degrees of freedom.
Note that in a one-dimensional setting, the multisample median test uses less information than the Kruskal-Wallis test does, and may therefore be less powerful.The asymptotic relative efficiency of the multisample median test is 2/3 with respect to the Kruskal-Wallis test for a normal distribution (e.g.Gibbons and Chakraborti 2010).The multisample version of the Lepage statistic is preferable for location, scale and location-scale parameters.However, Rublík (2007) showed that the multisample version of a combination of the Kruskal-Wallis and multisample Mood statistics is more efficient than the multisample Lepage statistic for shifted location, scale and location-scale parameters with various distributions.
The statistic T where the randomization in (1) is independent of the observations.For any C, and the statistic rejects Herein we suggest another multivariate Kruskal-Wallis-type test, namely V (p) , as follows:

Simulation study
We employed R software to investigate the behavior of the T 1 , T 2 and T 3 statistics in simulation studies.Additionally, we used the Wilks' λ, namely W λ , the Pillai's trace, specifically P T , and the Lawley-Hotelling trace, specifically LH, as a classical MANOVA test (Rencher 1998).
The simulations included 100,000 replications, and the significance level was 5%.To compare the power of the classical MANOVA test and tests based on the multivariate nonparametric statistics, we carried out a simulation study of different populations with various distributions.
• t(µ i , Σ i , δ i ): the multivariate t distribution with δ degrees of freedom.
To generate random numbers, we used the packages "mvrnorm," "rmt," and "rlnorm.rplus"for the multivariate normal, multivariate t and multivariate lognormal distributions, respectively.We define a p-dimensional matrix as follows: 3 = 1 0.4 0.4 1 , In this paper, we assume µ 1 = 0 and Σ 1 = I (p) , and we consider the following cases for the multivariate normal, multivariate t and multivariate lognormal distributions.
Case 1 Case 2 In the case of (n 1 , n 2 , n 3 ) = (5, 5, 5), we used the exact critical value of the T 1 , T 2 and T 3 statistics by Gibbons and Chakraborti (2010), Jorn and Klotz (2002) and Murakami (2008), respectively.Since it is difficult to evaluate the exact critical value of the statistic for the large sample sizes, we estimated the critical value via a permutation approach for (n 1 , n 2 , n 3 ) = (15, 10, 5) and (20,20,20).Additionally, we apply the following method to the V (p) statistic.
Our method for estimating the critical value is as follows: 1. Construct a dataset Z by generating N integers from 1 to N (without ties) for each dimension.
4. Calculate the T 1 , T 2 , T 3 and V (p) statistics from the dataset Z * .
T m(CV ) and V (p) (CV ) are then the estimated critical value of the statistics, where CV = B × α%.We simulated B = 100, 000 replications in this study.
Table 1 lists the simulation results for the multivariate normal distribution.
Table 1 shows that the classical MANOVA tests were more powerful than the multivariate multisample nonparametric statistics.Compared with nonparametric statistics, the proposed statistic was more efficient than the randomized nonparametric statistics were.Therefore, the V (p) statistic was more effective than the other nonparametric statistics for parameters associated with the multivariate normal distribution.
For a non-normal distribution, we used the multivariate t distribution with 2 degrees of freedom, and the results are listed in Table 2.
Table 2 shows that the classical MANOVA tests did not maintain 5% significance levels (not conservative) under the null hypothesis for unequal sample sizes.The non-conservative test is meaningless for testing the hypothesis.Moreover, the suggested statistic was more powerful than the parametric and nonparametric statistics.Therefore, the V (p) statistic was more effective than the other statistics were for parameters associated with the multivariate t distribution.We used the multivariate lognormal distribution to simulate an asymmetrical distribution; the results are listed in Table 3.
The results presented in Table 3 reveal the following facts: The classical MANOVA tests did not maintain 5% significance levels (not conservative) under the null hypothesis for unequal  (p) statistic was the most powerful statistic for the shifted location parameters when the sample sizes were equal and unequal.Therefore, the V (p) statistic was more effective than the other parametric and nonparametric statistics for parameters associated with the multivariate lognormal distribution.

Concluding remarks
In this paper, we considered multivariate multisample nonparametric statistics by applying Jurečková-Kalina-type rank of distance.Simulation studies showed that the multivariate Kruskal-Wallis-type statistic, named V (p) , was more powerful than the Kruskal-Wallis, multivariate multisample median and Lepage-type statistics for shifted location parameters under the multivariate normal, t and lognormal distributions.Additionally, the proposed statistic was more efficient than the classical MANOVA test for equal and unequal sample sizes with non-normal distributions.As ties occur frequently in practice, in future research we should investigate the powers of multivariate multisample nonparametric statistics under multivariate discrete distributions.
for s = 1, . . ., N under the null hypothesis.Randomization of T (p) 1 , . . ., T (p) N maintains the simple structure of the test.Thus, we obtain Pr(T

Table 1 :
Simulated power for the multivariate normal distributions

Table 1 :
Continued for the multivariate normal distributions Case of n 1 = 15, n 2 = 10 and n 3 = 5 for p = 3