An Extensive Comparisons of 50 Univariate Goodness-of-fit Tests for Normality

The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. Given the importance of this subject and the widespread development of normality tests, comprehensive descriptions and power comparisons of such tests are of considerable interest. Since recent comparison studies do not include several interesting and more recently developed tests, a further comparison of normality tests is considered to be of foremost interest. This study addresses the performance of 50 normality tests available in literature, from 1900 until 2018. Because a theoretical comparison is not possible, Monte Carlo simulation were used from various symmetric and asymmetric distributions for different sample sizes ranging from 10 to 100. The simulations results show that for symmetric distributions with support on (−∞,∞) the tests Robust Jarque–Bera and Gel–Miao–Gastwirth have generally the most power. For asymmetric distributions with support on (−∞,∞) the tests 1st Cabana-Cabana and 2nd Zhang-Wu have the most power. For distributions with support on (0,∞), and distributions with support on (0, 1) the test 2nd Zhang-Wu has generally the most power.


Introduction
The problem of testing for normality is fundamental in both theoretical and empirical statistical research. Parametric statistical methods assume that the data has a known and specific distribution, often a normal distribution. Therefore, testing normality is one of the most studied goodness-of-fit problems. There are many tests that can be used to check if your data sample deviates from a normal distribution.
Given the importance of this subject and the widespread development of normality tests over the years, comprehensive descriptions and power comparisons of such tests have also been the focus of attention, thus helping the analyst in the choice of suitable tests for his/her particular needs. An extensive simulation study is presented herein to estimate the power of 50 tests for normality, from 1900 until 2018, for several alternative distributions: Beta, Gamma, Gumbel, Laplace, Skew-Normal, Student's t, Uniform, and Weibull. Since a theoretical comparison is not possible, Monte Carlo simulation were used from these alternative distributions for different sample sizes ranging from 10 to 100. distributions used for the tests are explained in Section 3. The Monte Carlo simulation is explained in Section 4. Results and Recommendations of the power comparisons of the 50 normality tests are discussed in Section 5.
Comparison of the normality tests has received attention in the literature. The goodnessof-fit tests have been discussed by many authors including Shapiro, Wilk, and Mrs Chen (1968), Farrell and Rogers-Stewart (2006), Yazici and Yolacan (2007), Xavier, Raimundo, and Aníbal (2010) Yap and Sim (2011), Noughabi and Arghami (2011), and Torabi, Montazeri, and Grané (2016). Since recent comparison studies do not include several interesting and more recently developed tests, a further comparison of normality tests is considered to be of foremost interest.

Tests for normality
Tests for Normality can be classified into tests based on Chi-square, a test of goodness of fit establishes whether an observed frequency distribution differs from a theoretical distribution (Pearson's chi-square test), Empirical distribution function, these tests are based on a comparison of the empirical and hypothetical distribution functions (Cramer-von Mises, Lilliefors,Frosini,, Measures of the moments, these tests are derived from the recognition that the departure of normality may be detected based on the sample moments (Geary, Kurtosis test, Skewness test, D'Agostino-skewness, Spiegelhalter, 1st Hosking,2nd Hosking,3rd Hosking,4th Hosking,Bonett and Seier,1st Bontemps and Meddahi,bust Jarque-Bera, Desgagne-LafayeDeMicheaux X AP D , Desgagne-LafayeDeMicheaux Z EP D ), Regression and correlation, these tests are based on the ratio of two weighted least-squares estimates of scale obtained from order statistics Filliben,1st Zhang Q, 2nd Zhang Q, Barrio-Cuesta-Matran-Rodriguez, Coin), Maximum entropy, this test are based on the property that the normal distribution has the highest entropy of any distribution for a given standard deviation (Vasicek-Song), Empirical characteristic function, the test uses the difference between the characteristic functions of the sample and of the normal distribution (Epps and Pulley), Lagrange Multiplier, this test is maximizing the log-likelihood subject to the constraint (Desgagne-LafayeDeMicheaux-Leblanc test).
We give a short description of the 50 methods of testing for univariate normality. The presentation is in chronological order, from 1900 until 2018. It also contains references to definitions of these tests.
1. Pearson's chi-square test (Pearson (1900), see also Moore (1986), Hogg, McKean, and Craig (2018)): where O i is the observed counts and E i is the number of expected observations (under H 0 ) in class i. The classes are build in such a way that they are equiprobable under the null hypothesis of normality.
2. Cramer-von Mises test (Cramér (1928) and Mises (1931), see also Thode Jr. (2002)): where p (i) = Φ x (i) − x /s . Here, Φ is the cumulative distribution function of the standard normal distribution, and x and s are mean and standard deviation of the data values. The p-value is computed from the modified statistic Z = W (1.0 + 0.5/n) according to the Table 4.9 in Stephens (1986).
3. Geary test (Geary (1935)) for normality is based on the ratio of the mean deviation to standard deviation, Geary's test of normality is a simple compact but sensitive test of normality. If the null hypothesis of normality is true, the expected value of d is approximately 2/π ≈ 0.7979. Thus one rejects the null hypothesis for very large (d > 0.7979) or very small (d < 0.7979) values of d ).
Geary's test never gained widespread usage, possibly because 0 < d ≤ 2/π in leptokurtic distributions so that large increases in leptokurtosis have small numerical effects on d (Bonett and Seier (2002)).
4. Kolmogorov-Smirnov test (Kolmogorov (1933), Smirnov (1948), see also Thode Jr. (2002), Hollander, Wolfe, and Chicken (2014)) for a given cumulative distribution function F (x) is where F 1,n (x) is the empirical distribution of the data and F 2,m (x) is empirical distribution function of the normal distribution. Kolmogorov-Smirnov test is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions.
5. Shapiro-Wilk test (Shapiro et al. (1968), see also Thode Jr. (2002)) is where x (i) is the ith order statistic and x is the sample mean.
The coefficients a i are given by: is made of the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution; V is the covariance matrix of those normal order statistics.
6. Lilliefors test (Lilliefors (1967), see also Thode Jr. (2002)) is a modification of the Kolmogorov-Smirnov test for normality when the mean and the variance are unknown, and must be estimated from the data. The test statistic is the maximal absolute difference between empirical and hypothetical cumulative distribution function. It may be computed as Here, Φ is the cumulative distribution function of the standard normal distribution, x (i) is the ith order statistic and x and s are mean and standard deviation of the data values.
7. Kurtosis test (Shapiro et al. (1968), see also Thode Jr. (2002)) for normality is based on the following statistic b 2 is asymptotically normal with mean 3 and variance 24/n. 8. Skewness test (Shapiro et al. (1968), see also Thode Jr. (2002)) for normality is based on the following statistic Under the null hypothesis of normality, √ b 1 is asymptotically normal with mean 0 and variance 6/n. 9. D'Agostino-skewness test , see also Thode Jr. (2002)) based on a transformation of the distribution of Equation (3) to normality which works well for small sample sizes, n ≥ 8. Let 1/2 and B 2 = 3 n 2 + 27n − 70 (n + 1) (n + 3) (n − 2) (n + 5) (n + 7) (n + 9) , 10. Shapiro-Francia test (Shapiro and Francia (1972)) is a modification of Shapiro-Wilk test Equation (1). It is defined as Under the null hypothesis that the data is drawn from a normal distribution, this correlation will be strong, so W values will cluster just under 1, with the peak becoming narrower and closer to 1 as n increases. If the data deviate strongly from a normal distribution, W will be smaller. Monte Carlo simulations have shown that the transformed statistic ln (1 − W ) is nearly normally distributed.
The normality hypothesis of the data is rejected for large values of the test statistic K 2 . The test statistic K 2 is approximately chi-squared distributed with two degrees of freedom.
12. Filliben test (Filliben (1975)) use the correlation between the sample order statistics and estimated median values of the theoretical order statistics. For a sample of size n, Filliben used where the m (i) were estimated order statistic medians from a uniform distribution. He then used the transformation M (i) = Φ −1 m (i) to obtain an estimate of the median value of the ith normal order statistic. The correlation coefficient r is then defined as leading to the rejection of the normality hypothesis of the data for small values of r.
13. The Hegazy-Green-1 (Hegazy and Green (1975)) test for normality is based on the following statistic: 14. The Hegazy-Green-2 (Hegazy and Green (1975)) test for normality is based on the following statistic: 15. The Weisberg-Bingham (Weisberg and Bingham (1975)) test for normality is based on the following statistic: 16. Vasicek-Song (Vasicek (1976), Kai-Sheng Song (2002)) developed a test for normality using an estimate of the sample entropy for n > 3. The entropy of a density f (x) is An estimate of H(f ) can be calculated as where m is a positive integer, m < n/2 and x (k) = x (1) for k < 1 and x (k) = x (n) for k > n. Among all densities with a given variance σ 2 , H(f ) is maximized by the normal density, with entropy 2πe for all f (x) , equality being attained under normality. Therefore, an omnibus test for a sample of size n is defined by rejecting the null hypothesis if where K * is the appropriate critical value for the test and 17. The Spiegelhalter (Spiegelhalter (1977)) test for normality is based on the following statistic: 18. Martinez and Iglewicz (Martinez and Iglewicz (1981)) have proposed a normality test based on the ratio of two estimators of variance, where one of the estimators is the robust biweight scale estimator where M is the sample median, z i = (x i − M ) / (9A) , with A being the median of |x i − M | , and when | z i | > 1, z i is set to 0. The Martinez and Iglewicz test statistic I n is then given by for which the normality hypothesis of the data is rejected for large values of I n . Martinez and Iglewicz (1981) have shown that this test is very powerful for heavy-tailed symmetric distributions.
19. Epps and Pulley (Epps and Pulley (1983)) test statistic T EP is based on the following weighted integral where ϕ n (t) is the empirical characteristic function given by n −1 n j=1 exp (i t x j ) , ϕ 0 (t) is the sample estimate of the characteristic function of the normal distribution given by exp i t x − 0.5m 2 t 2 and G (t) is an adequate function chosen according to several considerations Epps and Pulley (1983). By setting dG (t) = g (t) dt and selecting g (t) = m 2 /2π · exp −0.5m 2 t 2 the following statistic can be obtained by for which the normality hypothesis of the data is rejected when large values of TEP are obtained.
20. Anscombe-Glynn (Anscombe and Glynn (1983))test of kurtosis for normal samples is based on the transformed kurtosis Equation (5): Under the hypothesis of normality, data should have kurtosis equal to 3.

21.
Anderson-Darling test (Stephens (1986)) is an empirical distribution function omnibus test for the composite hypothesis of normality. The test statistic is Here, Φ is the cumulative distribution function of the standard normal distribution, and x and s are mean and standard deviation of the data values.
22. Frosini test (Frosini (1987)) for normality is based on the following statistic: 23. Jarque-Bera (Jarque and Bera (1987)) test for normality is based on the following statistic: where √ b 1 and b 2 are the sample skewness in equation (3) and sample kurtosis in equation (2), respectively. H 0 is rejected for large values of JB. Jarque-Bera test is a large sample test and may not be appropriate in small samples.
24. The 1st Hosking test (Hosking (1990)). Hosking has shown the rth order sample Lmoment can be estimated by Based on the second, third and fourth sample L-moments, which have similarities with the corresponding central moments, Hosking (Hosking (1990)) also defines new measures of skewness and kurtosis, termed L-skewness τ 3 and L-kurtosis τ 4 as follows The value of τ 3 is bounded between −1and 1 for all distributions and is close to zero for the normal distribution, while the value of τ 4 is ≤ 1 for all distributions and is close to 0.1226 for the normal distribution. Hosking has suggested that normality could be tested based on τ 3 and τ 4 according to the following statistic T Lmom where µ τ 3 and µ τ 4 are the mean of τ 3 and τ 4 , and var(τ 3 ) and var(τ 4 ) are their corresponding variances. Nonetheless, µ τ 3 and µ τ 4 are expected to be close to 0 and 0.1226. The normality hypothesis of the data is rejected for large values of T Lmom .
25. The 2nd Hosking test (Hosking (1990)). Although L-moments exhibit some robustness towards outliers in the data, as previously referred, they may still be affected by extreme observations (Elamir and Seheult (2003)). A robust generalization of the sample Lmoments has, therefore, been formulated by Elamir and Seheult (Elamir and Seheult (2003)) leading to the development of trimmed L-moments. The proposed formulation for the trimmed L-moments allows for both symmetric and asymmetric trimming of the smallest and largest sample observations. For the case of normality testing suggested herein, only symmetric trimming is considered.
Considering an integer symmetric trimming level t , Elamir and Seheult (Elamir and Seheult (2003)) have shown the rth order sample trimmed L-moment l (t) r can be estimated by Based on the second, third and fourth sample trimmed L-moments, Elamir and Seheult (Elamir and Seheult (2003)).also define new measures of skewness and kurtosis, termed TL-skewness τ Based on these new measures, the following test, similiar to that given by Equation (6), is as follows , where for a selected trimming level t , µ    Lmom . The 2nd Hosking test, which correspond to symmetric trimming levels t = 1, , the normality hypothesis of the data is rejected for large values of the statistic T 26. The 3rd Hosking test (Hosking (1990)), which correspond to symmetric trimming levels t = 2, is as follows, 3 ) , the normality hypothesis of the data is rejected for large values of the statistic T 27. The 4th Hosking test (Hosking (1990)), which correspond to symmetric trimming levels t = 3, is as follows, 3 ) + τ , the normality hypothesis of the data is rejected for large values of the statistic T (t) Lmom .
28. The 1st Cabana-Cabana test (Cabana and Cabana (1994)). The Cabana-Cabana test statistics are based on the definition of approximate transformed estimated empirical processes (ATEEP) sensitive to changes in skewness or kurtosis. The proposed ATEEP sensitive to changes in skewness is defined as: where L is a dimensionality parameter, φ (x) is the probability density function of the standard normal distribution, H j (·) represents the j th order normalized Hermite polynomial given by and H j is the jth order normalized mean of the Hermite polynomial defined as The proposed ATEEP sensitive to changes in kurtosis is defined as According to (Cabana and Cabana (1994)), the dimensionality parameter L ensures that the test is consistent against alternative distributions differing from the normal distribution having the same mean and variance in at least one moment of order not greater than L + 3. The Kolmogorov-Smirnov type test statistics sensitive to changes in skewness and in kurtosis, T S,L and T K,L respectively, are defined as and Based on results presented in (Cabana and Cabana (1994)), parameter L was considered to be five.
The 1st Cabana-Cabana test for normality is based on equaion (8): The normality hypothesis of the data is rejected for large values of the test statistic.
29. The 2nd Cabana-Cabana (Cabana and Cabana (1994)) is based on equaion (9): The normality hypothesis of the data is rejected for large values of the test statistic.
30. Chen-Shapiro test (Chen and Shapiro (1995)) is based on normalized spacings and defined as in which M i is the ith quantile of a standard normal distribution obtained by Φ −1 [(i − 0.375) / (n + 0.25) The normality hypothesis of the data is rejected for small values of CS. 31. Adjusted Jarque-Bera test (Urzua (1996)) for normality is based on the following statistic: where √ b 1 and b 2 are the sample skewness in equation (3) and sample kurtosis in equation (2), respectively. .
32. Rahman-Govindarajulu (Rahman and Govindarajulu (1997)) have proposed a modification to the Shapiro-Wilk test, hereon termed W RG , which relies on a new definition of the weights a as follows, where it is assumed that m 0 φ (m 0 ) = m n+1 φ (m n+1 ) = 0. With this modification, the new test statistic W RG assigns larger weights to the extreme order statistics than the original W test, which has been seen to result in higher power against short tailed alternative distributions. As for the original W test, the normality hypothesis of the data is rejected for small values of W RG .
33. The 1st Zhang Q test (Zhang (1999)) is based on the ratio of two unbiased estimators of standard deviation, q 1 and q 2 , and given by The estimators q 1 and q 2 are obtained by and where the ith order linear coefficients a i and b i result from The ith expected value of the order statistics of a standard normal distribution, . According to Zhang (Zhang (1999)), Q is less powerful against negatively skewed distributions. Q approximately follows normal distribution.
Based on the definition of Q, the normality hypothesis of the data is rejected for both small and large values of the statistic using a two-sided test.
35. The Barrio-Cuesta-Matran-Rodriguez test (del Barrio, Cuesta-Albertos, Matrán, and Rodríguez-Rodríguez (1999)) for normality is based on the L 2 -Wasserstein distance between a sample distribution and the set of normal distributions as a measure of nonnormality, where the numerator represents the squared L 2 -Wasserstein distance, m 2 is the sample standardized second moment. The normality hypothesis of the data is rejected for large values of the test statistic.
36. Glen-Leemis-Barr test (Glen, Leemis, and Barr (2001))for normality is based on the quantiles of the order statistics. Given the relation between the order statistics and the empirical distribution function. The Glen-Leemis-Barr test statistic is given by where p (i) are the elements of the vector p containing the quantiles of the order statistics sorted in ascending order. The elements of p can be obtained by defining vector u, with elements sorted in ascending order and given by u (i) = Φ(z (i) ).Considering that u (1) , u (2) , . . . , u (n) represent the order statistics of a sample taken from a uniform distribution U (0, 1), their quantiles, which correspond to the elements of p, can be determined knowing that u (i) follows a Beta distribution B(i; n − i + 1). The normality hypothesis of the data is rejected for large values of the test statistic. (2002)) test is based on a modification of Geary's measure of kurtosis (Geary (1935)), Z w ,

Bonett and Seier (Bonett and Seier
Z w approximately follows a standard normal distribution. The normality hypothesis H 0 is rejected for both small and large values of Z w using a two-sided test. 38. The Brys-Hubert-Struyf MC-LR test (Brys, Hubert, and Struyf (2008)) is based on robust measures of skewness and tail weight. The considered robust measure of skewness is the medcouple M C defined as where med stands for median, m F is the sample median and the kernel function h is given by For the case where The left medcouple (LM C) and the right medcouple (RM C) are the considered robust measures of left and right tail weight, respectively, and are defined by The test statistic T M C−LR is then defined by in which w is set as [M C, LM C, RM C] t , and ω and V are obtained based on the influence function of the estimators in w. For the case of a normal distribution, ω and V are defined as The normality hypothesis of the data is rejected for large values of T M C−LR . Note that T M C−LR approximately follows the chi-square distribution with three degrees of freedom.
39. The Brys-Hubert-Struyf-Bonett-Seier joint test. The Brys-Hubert-Struyf MC-LR test (Brys et al. (2008)) is a skewness associated test and that the Bonett and Seier test (Bonett and Seier (2002)) is a kurtosis based test, a joint test, termed T M C−LR − Z w , considering both these measures is proposed herein for testing normality. This joint test attempts to make use of the two referred focused tests in order to increase the power to detect different kinds of departure from normality. This joint test is proposed herein based on the assumption that the individual tests can be considered independent. The normality hypothesis of the data is rejected for the joint test when rejection is obtained for either one of the two individual tests for a significance level of α/2.

The 1st
Zhang-Wu test (Zhang and Wu (2005)) for normality is the normality hypothesis of the data is rejected for large values of the test statistic. (2005)) for normality is

The 2nd Zhang-Wu test (Zhang and Wu
the normality hypothesis of the data is rejected for large values of the test statistic. 42. The 1st Bontemps and Meddahi (Bontemps and Meddahi (2005)) proposed a family of normality tests based on moment conditions known as Stein equations and their relation with Hermite polynomials.The test statistics are developed using the generalized method of moments approach associated with Hermite polynomials, which leads to test statistics that are robust against parameter uncertainty. The general expression of the test family is thus given by where z i = (x i − x) /s and H k (·) represents the kth order normalized Hermite polynomial having the general expression given by Equation (7). The general BM 3−p family of tests asymptotically follows the chi-square distribution with p − 2 degrees of freedom It can be seen from Equation (12) that a number of different tests can be obtained by assigning different values to p, which represents the maximum order of the considered normalized Hermite polynomials. Two different tests are considered in Bontemps-Meddahi (Bontemps and Meddahi (2005)), these tests are termed BM 3−4 and BM 3−6 . Thus, The 1st Bontemps-Meddahi test for normality (Bontemps and Meddahi (2005)) is given by the normality hypothesis of the data is rejected for large values of the test statistic.
43. The 2nd Bontemps-Meddahi test for normality (Bontemps and Meddahi (2005)) is given by the normality hypothesis of the data is rejected for large values of the test statistic.
44. The Gel-Miao-Gastwirth test for normality (Gel, Miao, and Gastwirth (2007)) is based on the ratio of the standard deviation s and on the robust measure of dispersion J n , in which M is the sample median. The normality test statistic R sJ is therefore given by R sJ = s/Jn which should tend to one under a normal distribution. The normality hypothesis of the data is rejected for large values of R sJ and the statistic √ n(R sJ − 1) asymptotically follows the normal distribution N (0, π/2−1.5). The Gel-Miao-Gastwirth test has higher power against heavy-tailed observations. 45. The Doornik-Hansen test for normality (Doornik and Hansen (2008)) is based on the transformed skewness in Equation (4) and the use of a transformed kurtosis z 2 . The statistic of the Doornik-Hansen test DH is thus given by and the transformed kurtosis z 2 is given by , a = (n + 5) (n + 7) (n − 2) n 2 + 27n − 70 + b 1 (n − 7) n 2 + 2n − 5 6 (n − 3) (n + 1) (n 2 + 15n − 4) .
The null hypothesis H 0 is rejected for large values of DH. (2008)) for normality is based on a polynomial regression focused on detecting symmetric non-normal alternative distributions. According to Coin (Coin (2008)), the analysis of standard normal Q − Q plots of different symmetric non-normal distributions suggests that fitting a model of the type

Coin test (Coin
where β 1 and β 3 are fitting parameters and α i represent the expected values of standard normal order statistics, leads to values β 3 different from zero when in presence of symmetric non-normal distributions. Therefore, Coin (Coin (2008)) suggests the use of β 2 3 as a statistic for testing normality, thus rejecting the normality hypothesis of the data for large values of β 2 3 . The values of α i are obtained using the approximations provided in Royston (1982).
47. The Gel-Gastwirth Robust Jarque-Bera test. Gel and Gastwirth (Gel and Gastwirth (2008)) proposed a modification of Jarque-Bera test that uses a robust estimate of the dispersion in the skewness and kurtosis instead of the second order central moment m 2 . The selected robust dispersion measure is the average absolute deviation from the median and leads to the following statistic of the Robust Jarque-Bera test RJB given by with J n obtained from Equation (13). The normality hypothesis of the data is rejected for large values of the test statistic. RJB asymptotically follows the chi-square distribution with two degrees of freedom. RJB test is more powerful in detecting moderately heavy-tailed departures from normality, especially in small and moderate samples.
48. The Desgagne-LafayeDeMicheaux-Leblanc (Desgagné, de Micheaux, and Leblanc (2013)) test for normality is tailored to detect departures from normality in the tails of the distribution. The proposed test for normality is given by Under the null hypothesis R n approximately follows the chi-square distribution with three degrees of freedom.
49. The Desgagne-LafayeDeMicheaux X AP D test for normality (Desgagné and de Micheaux (2018)) for finite sample sizes n ≥ 10 is given by the '2nd-power skewness' and '2nd-power kurtosis' are, respectively, denoted by B 2 and K 2 , and defined by (Desgagné and de Micheaux (2018)) as γ is the Euler-Mascheroni constant. X AP D approximately follows the chi-square distribution with two degrees of freedom for all n ≥ 10.
50. The Desgagne-LafayeDeMicheaux Z EP D test for normality (Desgagné and de Micheaux (2018)) for finite sample sizes n ≥ 10, is given by where K 2 is given in Equation (14) and γ in Equation (15) and Z EP D approximately follows the standard normal distribution N (0, 1) for all n ≥ 10.

Statistical distributions used in the simulation study
The simulation study uses a number of alternative statistical distributions over which the performance of the presented normality tests is to be assessed. The selected alternative distributions were chosen in order to be a representative set exhibiting different values of important properties such as skewness and kurtosis. Following Esteban, Castellanos, Morales, and Vajda (2001), these alternative distributions are categorized into four groups, depending on the support and shape of their densities as follows:

Simulation study
Since a theoretical comparison is not possible, power comparisons of tests for normality are made by using Monte Carlo simulation. To compare the power of the tests we generate samples of sizes n = 10, 30 , 50 , 70 and 100 from the alternative distributions in Section 3. The number of simulation is 10000 and the level of significance α = 0.05. We compute the power of the test as the proportion of times we correctly reject the null hypothesis in 10000 replications at α = 0.05 level of significance. For doing the simulation and computing the estimated powers of the tests for normality, R language (R Core Team (2021)) and the R packages are used. The followings R packages are used to test the normality: DescTools (Signorell (2020)), evd (Stephenson (2002)), fBasics (Wuertz, Setz, and Chalabi (2020)), lawstat (Gastwirth, Gel, Hui, Lyubchich, Miao, and Noguchi (2020)), moments (Komsta and Novomestky (2015)), normtest (Gavrilov and Pusev (2014)), nortest (Gross and Ligges (2015)), PoweR (Lafaye de Micheaux and Tran (2016)), rmutil (Swihart and Lindsey (2020)), sn (Azzalini (2021)), and vsgoftest (Lequesne and Regnault (2020)). Table 1 through 9 respectively report the estimates of the power of the 50 tests for normality, in order of increasing power, under the alternative distributions in Section 3.
The difference of the power of the tests becomes more apparent when the comparison is carried out graphically. Figures 1 through 17 respectively present the simulated power curves for 50 normality tests under the alternative distributions in Section 3 for sample n = 10, 30 , 50 , 70 and 100 based on the results of Table 1 through 9 respectively. The vertical axis of the figures measure the simulated power of the tests for normality and the horizontal axis represents the sample sizes n.                        Table 10 contains the ranking from the first to the tenth of normality tests that have the most power obtained from Table 1 -9 for the four groups of the alternative distributions, respectively. Table 11 contains the ranking from the first to the tenth of normality tests that have the least power obtained from Table 1 -9 for the four groups of the alternative distributions, respectively.

Results and recommendations
For Group I: Symmetric distributions with support on (−∞, ∞), as shown in Table 1 -3 and  Table 10 -11 and Figure 1 -5, we can see that the tests Geary and Kolmogorov-Smirnov have the least power and the tests Robust Jarque-Bera and Gel-Miao-Gastwirth have the most power.
For Group II: Asymmetric distributions with support on (−∞, ∞), as shown in Table 4 -5 and Table 10 -11 and Figure 7 -9, we can see that the test Kolmogorov-Smirnov has the least power and the test 2nd Zhang-Wu has the most power.
For Group III: Distributions with support on (0, ∞), as shown in Table 6 -7 and Table 10 -11 and Figure 11 -13, we can see that the tests Kolmogorov-Smirnov and Geary have the least power and the test 2nd Zhang-Wu has the most power.
For Group IV: Distributions with support on (0, 1), as shown in Table 8 -9 and Table 10 -11 and Figures 15 -17, we can see that the test Kolmogorov-Smirnov has the least power and the tests 1st Zhang and the test 2nd Zhang-Wu have the most power.
In terms of the selected normality tests based on the empirical distribution function, for the case of the symmetric distributions, the test Hegazy-Green-2 has the most power and the tests Kolmogorov-Smirnov and Lilliefors have the least power (see Figure 2, 4, and 6). For the asymmetric distributions, the tests 1st Zhang-Wu and 2nd Zhang-Wu have the most power and the tests Kolmogorov-Smirnov and Lilliefors have the least power (see Figure 8 and 10). For distributions with support on (0, ∞) and distributions with support on (0, 1), the tests 1st Zhang-Wu and 2nd Zhang-Wu have the most power and the tests Kolmogorov-Smirnov and Lilliefors have the least power (see Figure 12,14,16 and 18).
In terms of the selected normality tests based on measures of the moments, the tests Robust Jarque-Bera and Gel-Miao-Gastwirth generally have the most power and the tests Geary and Brys-Hubert-Struyf have the least power for the symmetric distributions (see Figure 2, 4, and 6). For the case of asymmetric distributions, the test 1st Cabana-Cabana has the most power and the test Geary has the least power (see Figure 8 and 10). For distributions with support on (0, ∞) the test 1st Cabana-Cabana and Skewness have the most power and the tests Geary and 4th Hosking have the least power (see Figure 12 and 14). For distributions with support on (0, 1) the test 1st Cabana-Cabana has the most power and the test 4th Hosking has the least power (see Figure 16 and 18) .
In terms of regression and correlation tests, for the case of the symmetric distributions, the tests Filliben and Shapiro-Francia have the most power and the tests 1st Zhang Q and Rahman-Govindarajulu have the least power (see Figure 2, 4, and 6). For the asymmetric distributions, the tests Shapiro-Wilk and Chen-Shapiro have the most power and the test Coin has the least power (see Figure 8 and 10). For distributions with support on (0, ∞) and distributions with support on (0, 1), the tests Shapiro-Wilk and Chen-Shapiro have the most power and the tests Coin and 2nd Zhang Q have the least power (see Figure 12, 14, 16 and 18).