Easily Changeable Kurtosis Distribution

The goal of this paper is to introduce the easily changeable kurtosis (ECK) distribution. The uniform distribution appears as a special cases of the ECK distribution. The new distribution tends to the normal distribution. Properties of the ECK distribution such as PDF, CDF, modes, inﬂection points, quantiles, moments, moment generating function, Moors’ measure, moments of order statistics, random number generator and the Fisher Information Matrix are derived. The unknown parameters of the ECK distribution are estimated by the maximum likelihood method. The Shannon, Renyi and Tsallis entropies are calculated. Illustrative examples of applicability and ﬂexibility of the ECK distribution are given. The most important R codes are presented in the Appendix.


Introduction
The article presents a flexible, symmetric distribution defined in the finite domain. It is named as the easily changeable kurtosis (ECK) distribution. Its special cases is the uniform distribution. The ECK distribution tends to the normal distribution. The various properties of the ECK distribution are presented. Undoubtedly, symmetric distributions do not form such a big family as asymmetric distributions. Table 1 andnTable 2 present (in alphabetical order) symmetric distributions defined in the infinite and finite domains, respectively, with the formulas for the excess kurtosis γ 2 and the number of modes. Instead of kurtosis γ 2 , the paper analyzes the excess kurtosis γ 2 = γ 2 − 3, which can be positive or negative.
The author would like to emphasize that the correctness of the formulas obtained in the Mathematica software has been checked using numerical methods.
The proposed distribution, as it will be shown further in this paper, can be classified simultaneously into groups 1, 9 and 11, just like the Q-gaussian distribution defined in the finite domain. Both distributions are characterized by a simple excess kurtosis formula (see group 7) and their special cases are uniform and normal distributions. The excess kurtosis range  The ECK distribution can be used to model an excess kurtosis in the range (−2, 0). This distribution can be extremely useful when you want to seamlessly test the goodness-of-fit tests (GoFTs) ability to detect deviations from normality caused by a negative excess kurtosis. The new proposition as a component of the mixed distribution will also be used when fitting the distributions to the data.
It should also be mentioned that there is a group of asymmetric distributions, which are symmetrical for certain parameter values, e.q. the truncated normal, Birnbaum-Saunders (Birnbaum and Saunders 1969), skew-normal (Azzalini 1985), beta, two-piece normal (Gibbons and Mylroie 1985) and two-piece power normal (Sulewski 2019b) distributions.
The author would like to emphasize that the correctness of the formulas obtained in the Mathematica software has been checked using numerical methods.
This paper is organized as follows. Section 2 presents the main properties of the ECK distribution such as PDF, CDF, modes, inflection points, quantiles, moments, moment generating function, Moors' measure, moments of order statistics, instructions to generate ECK random numbers and the Fisher Information Matrix. The estimation procedures are provided in Section 3, while the Entropy is presented in Section 4. The papers ends with applications and conclusions. The most important R codes are given in the Appendix.
2. Main properties of introduced distribution 2.1. Distribution and density functions Definition 1. The distribution of the random variable X with PDF given by is called the easily changeable kurtosis (ECK) distribution, where a > 0 is the scale parameter and p > −1 is the shape parameter. The ECK(a > 0, p > −1) is symmetric around zero since, based on (1), f (x; a, p) = f (−x; a, p) ( Figure 2). The R codes of the dECK function for computing PDF are provided in the Appendix.
The standard deviation of the new proposal based on (28) equals a 2 2p+3 , therefore the ECK(a, p) distribution tends to the normal distribution N 0, a 1 2p+3 . Let M (2) be the similarity measure of these distributions (Sulewski 2019a). We have for The similarity measure M takes values on (0,1) and if PDFs are identical then M = 1.
Proof. Let p ≥ 0. Support [−a, a] can be presented in two intervals [−a, 0] ∪ (0, a], in which x 2 a 2 is strictly monotonic.The inverse function of y = x 2 a 2 on several intervals is given by We cane write the PDF of Y as We have from (1) f (−a √ y; a, p) = 1 aB (0.5, p + 1) (1 − y) p = f (a √ y; a, p) .
As a result of simple transformations g (y; p) = 1 B (0.5, p + 1) The above equation is the PDF of the beta distribution with the parameters 0.5 and p + 1. The same result we obtain for −1 < p < 0 and support (−a, a). The proof is complete.
Proof. Let ±a √ y are strictly monotonic. The inverse function of x = a √ y and x = −a √ y is y = x 2 a 2 as well as dy dx = 2|x| a 2 . In this situation, we can write a PDF of X as f (x; a, p) = |x| a 2 g x 2 a 2 ; 0.5, p + 1 .
The proof is complete.
Using the PDF of the beta distribution in (3), we get (1) obviously.
Theorem 3. If X ∼ W CK(a > 0, p > −1) with the PDF f (x; a, p) (1) then CDF of X is given by where sgn and G are the signum function and CDF of the beta distribution, respectively.
The proof is complete.
The R codes of the pECK function for computing CDF are provided in the Appendix. Figure 3 plots CDF of the ECK(a > 0, p > −1) distribution for some values of parameters. For p = 0 we obtain the straight line (uniform distribution). For p > 0 CDF is convex in [−a, 0) and is concave in (0, a]. For −1 < p < 0 CDF is concave in (−a, 0) and is convex in (0, a). For p = 170 CDFs of ECK distribution and normal one coincide. Theorem 4. The ECK(a > 0, p > −1) distribution with PDF given by (2) is identifiable in a parameter space v = (a, p).
The proof is complete because the beta distribution is identifiable.
Proof. Let p ≥ 0 then PDF of the ECK distribution for any x ∈ [−a, a] is given by As a result of simple transformations x m = 0 and (13) is positive on the interval (−a, 0) and negative on the interval (0, a).
Let −1 < p < 0 then PDF (12) is defined for any x ∈ (−a, a). As a result of simple transformations, (13) is negative on the interval (−a, 0) and positive on the the interval (0, a). For x values very close to −a and a PDF (12) has locally maximum values. The author of this paper denotes these values as x m (−a), x m (a) and proposed distribution defines as pseudo bimodal with modes at these points. The proof is complete.
Theorem 6. Let X ∼ ECK(a > 0, p > −1). The inflection points of the f (x; a, p) (1) for p > 1 are given by means of the following formulas Proof. We can write (13) as Let A = −2pΓ(p+1.5) a 3 √ πΓ(p+1) then the second derivative of (12) is given by Based on (18) we have for p > 0.5 For a > 0 ∧ p > 0.5 is x 1 < x 2 , hence taking into account the domain (−a, a) we obtain Solving the above inequalities, we have finally p > 1. The proof is complete.

Quantiles
Theorem 7. Let X ∼ ECK(a > 0, p > −1). The q-th (0 < q < 1) quantile x q is the solution of the following equation where sgn and I are the signum function and CDF of the beta distribution, respectively.
The proposed distribution is symmetrical then x p = −x 1−p , obviously and x 0.5 = 0.
The quantile x q can be computed by numerical methods. The R codes of the qECK function for computing the quantile x q are provided in the Appendix.
Changing the shape parameter values p, we can obtain the excess kurtosis of the ECK(a > 0, p > −1) distribution in the range (−2, 0) ( Figure 4). As the shape parameter p increases, the excess kurtosis increases from −2 to 0. Theorem 10. The moment generating function of the ECK(a > 0, p > −1) distribution is given by where 0 F 1 (a; x) is the confluent hypergeometric function.
Proof. Based on (1) we have Formula (36) can be written using a power series (Wolfram (1988)) where (p + 1.5) k is the Pochhammer symbol The proof is complete. Moors (1988) proposed a measure based on quantiles in the form where x q is the solution of (20). The measure (38) is a quantile alternative for kurtosis and exists even for distribution for which no moments exist. Of course, this measure does not depend on the scale parameter a. Figure 5 shows the measure T as a function of the shape parameter p. The T (p) is strictly increasing function for the initial p values.

Moments of order statistics
Theorem 11. Let X i,n be the i-th order statistic (X 1,n ≤ X 2,n ≤ ... ≤ X n,n ) in a sample of size n from the ECK(a > 0, p > −1) distribution. The k-th moment of the i-th order statistic X i,n is defined as and where F (x; a, p) is CDF (4).
Proof. The proof based on the definition of PDF of the order statistics is trivial. Figure 6 shows PDF of the X 5i,30 (i = 1, . . . , 5) of the ECK(a > 0, p > −1) distribution 2.6. Random number generator Theorem 12. Let X ∼ ECK(a > 0, p > −1), R ∼ U (0, 1). The random number generator of X, using the inverse CDF method, is given by where G −1 is the quantile function of the beta distribution.
The proof, using inversion of the CDF (4), is trivial.
The R codes of the rECK function for generating n values of X are presented in the Appendix.
Proof. First, we need to take the logarithm. From (1) we have Second, we need to calculate the partial derivatives where Ψ is the digamma function. Hence, we get the Fisher score in the form x 2 −a 2 Ψ (p + 1.5) + ln 1 − x 2 a 2 − Ψ (p + 1) .
Let u (x; a, p) = h (x; a, p) h (x; a, p) T and A = Ψ (p + 1.5) , B = Ψ (p + 1) then Let I i,j = E [u i,j ] (i, j = 1, 2) then To write the Fisher Information Matrix in a simpler form we use (23), (29) and integration by substitution x a . We have: The proof is complete.

Maximum likelihood estimation
Let x * 1 , x * 2 , ..., x * n be a random sample size n from the ECK(a > 0, p > −1) distribution. Our target is to estimate the unknown values of the parameters a, p. The likelihood function based on (1) is given by then the log-likelihood function is defined as l = ln (L) = n ln Γ (p + 1.5) a √ πΓ (p + 1) and dl dp = nΨ (p + 1.5) − nΨ (p + 1) where Ψ is the digamma function. The maximum likelihood estimates (MLEs) are solutions of the system equations (57) and (58). From (52) we get Substituting (59) into (58) we obtain the nonlinear equation in the form Solving (60) with numerical method we have a and from (59) The biases and the root mean squared errors (RMSEs) of the MLEs are shown in Table 3. The simulation study was performed with 10 3 samples using sample sizes of 50, 100, 200, 500. The samples were drawn from the ECK(1, p), where p = (1, 2, 3). We observe that the estimates approach true values when the sample size increases, it implies the consistency of the estimates. The biases of the a and p diminish for large samples and are smaller for a than for p. The RMSEs increase with the value of p. To examine the accuracy of the coverage probability of the asymptotic confidence intervals (CIs), another simulation study was performed with 10 3 samples using sample sizes of 50, 100, 200, 500. The study focused on the parameters a, p and samples drawn from the ECK(a = 1, p = 4). The coverage probabilities of the obtained 95% CIs for a = 1, p = 4 reported in Table 4 are very close to the nominal level. The results suggested that the obtained standard errors and hence the asymptotic CIs are reliable.
The proof is complete.
The Renyi and Tsallis entropies converge to the Shannon entropy. Figure 7 shows the Shannon, Renyi and Tsallis entropies for the ECK(a = 1, p > −1) distribution. The Shannon and Renyi entropies increase for p ∈ (−1, 0) and decrease for p > 0. The Tsallis entropy decreases for p ∈ (−1, 0) and increases for p > 0. The higher a value, the higher S value. The higher α value, the lower R α value.

Application
This section is divided into two subsections. We present examples of the applicability and flexibility of the ECK(a > 0, p > −1). The first subsection is devoted to the GoFTs, the second one deals with fitting distributions to data.

Comparison of goodness-of-fit tests
As it was mentioned in the Introduction, the advantage of the ECK distribution is e.g. a simple formula that allows you to change an excess kurtosis. The distribution can be extremely useful when you want to seamlessly test the GoFTs ability to detect deviations from normality caused by a negative excess kurtosis.
The LF m test statistic is given by If an alternatively distribution is both symmetric and of negative excess kurtosis α = β = 0 are recommended.
The similarity measure M (2) of N (0.0216) and ECK(a = 4, p = 170), as was mentioned in Section 2.1, is 0.999. Figure 8 shows PDF of the N (0, 0.216) and ECK(a = 4, p) distributions involved in the Monte Carlo simulation. In the legend of this figure values of similarity measures M (2) of these distributions are given. If p increases, the similarity measure M also increases.
Phase 1: In this phase the aim is to investigate to what degree selected GoFTs listed in Table  5 are able to distinct between N (0, 0.216) and ECK(a = 4, p) distributions. In other words the aim is to determine powers of GoFTs being under discussion when samples come from ECK(a = 4, p) general populations. Table 5 shows how the ECK distribution tends to the Normal distribution in terms of kurtosis as its shape parameter increases.  For the aim to be accomplished, critical values cv 0.05 ascribed to the GoFTs (where α = 0.05 is the test confidence level) were needed. These cv 0.05 values were estimated with the Monte Carlo method. Seven large scale experiments were performed each of which devoted to one of the GoFT. Each experiment consisted of generating 10 6 samples of sizes n = 30 and n = 50. The samples followed the N (0, 0.216) distribution. Each sample was tested for normality. Obtained in this way values of test statistics (denoted Q i (i = 1, 2, ..., m) were collected an then ranked. Critical values were assessed according to the formula cv 0.05 = Q [0.95m] . Table 6 presents obtained cv 0.05 critical values. Tables 7 and 9, in turn, present relevant test powers when samples come from the ECK general populations. The scale parameter was set to 4. Values of the shape parameter were listed in Table 5.   Powers of the considered tests are close to confidence level we set. It means that in Phase 1 we revealed that considered GoFTs are unable to distinct between the negatively skewed distribution (not only slightly but even moderately i.e. γ 2 = −0.3 ) and the Normal distribution.
The following question is sure to be asked. What about numerous distributions similar to the new proposition. There were some investigations performed not being so in-depth like presented above, rather shallow ones. On their basis, however, one can tentatively say that the above conclusions relate to mentioned distributions too.
Phase 2. In this phase the aim is to investigate to what degree an undetected negative kurtosis impacts the performance of two basic test related to parameters of the Normal distribution, namely Student t-test and Fisher-Snedecor F test. For the aim to be accomplished we employ the Monte Carlo method and determine empirical CDFs of t and F test statistics in the case when samples come from ECK(4, 7.5) general population. Then we compare these empirical CDFs with "true" t and F CDFs i.e. which hold for in the case when samples come from the Normal general population. These comparisons will be simply Kolmogorov-Smirnov GoFT.
Since theoretical distributions are t and F distribution their parameters are known because they are so called degrees-of freedom equal to n − 1. It causes that K-S test can be applied in its classic, in other words, pre-Lilliefors form. What advocates the use of K-S is that it is powerful when sample is very large and parameters are not estimated from the sample but exactly known. Procedures that return values of "true" t and F CDFs are implemented in many computational environments including R software.
Let x 1,1 , x 1,2 , ..., x 1,n and x 2,1 , x 2,2 , ..., x 2,n be two samples of sizes n drawn from particular general populations. Let us remember that t and F test statistics have the following forms: where x 1 , x 2 are the sample means and s x1 , s x2 are the sample standard deviations.
The course of action was as follows: Step 1: 10 4 pairs of samples both of size n = 100 were drawn from ECK(4, 7.5) general population.
Step 3: Sets of values ofṫ v andḞ v statistics were stored in two matrices named T and F .
Step 4: The matrices were sorted in ascending order and served to determine two empirical CDFs namely Θ t ṫ v and Θ F Ḟ v .
Step 5: Probability papers were employed to check whether the above empirical CDFs fit the Student and Fisher-Snedecor distributions. Figure 9 show empirical CDFs of Step 4 plotted on the Student and Snedecor probability papers. These probability papers were constructed in the same way as the Normal probability is constructed and commonly used by practitioners over the World. It turns out that the empirical distribution in question perfectly fit straight lines that relevant theoretical distributions. Thus, we can conclude that Student and Fisher-Snedecor tests may be applied even as population distributions are of negative excess kurtosis, slight or indeed moderate one.

Fitting distributions to data
As it is well known, symmetric distributions have limited use in fitting the distributions to data (e.g. normal distribution). However, the situation looks much better when we use their mixture (e.g. compound normal distribution).
In this subsection, we present real data example to demonstrate a flexibility of the ECK(a > 0, p > −1) distribution in the mixed variant. PDF of the compound ECK (CECK) distribution is given by CECK (a, p 1 , p 2 , ω) = ωECK (a, p 1 ) + (1 − ω) ECK (a, p 2 ) The estimation of the model parameters is carried out by the maximum likelihood method.
To avoid local maxima of the logarithmic likelihood function, the optimization routine is run 100 times with several different starting values that are widely scattered in the parameter space.
Real data example The real data presents temperature dynamics of beaver Castor canadensis in north-central Wisconsin (Reynolds (1994)). Body temperature was measured by telemetry every 10 minutes from one period of less than a day. The data consists of 114 observations of the variable "measured body temperature in degrees Celsius" and are available in the R software with code beaver1[3].
The models selected for comparison with the CECK (a, p 1 , p 2 , ω) are: • the compound normal (CN): f CN (x; a 1 , b 1 , a 2 , b 2 , ω) = ωφ (x; a 1 , b 1 )+(1 − ω) φ (x; a 2 , b 2 ) • the compound Laplace (CL): f CL (x; a 1 , b 1 , a 2 , b 2 , ω) = ωf L (x; a 1 , b 1 ) (a < b, n = 1, 2, ...). Table 9 presents the MLEs, log-likelihood function l, AIC, BIC and HQIC for the data set. Models are sorted by AIC values. Figure 10 presents histograms, estimated PDFs and CDFs of the analyzed models.  Table 10 shows p-values for the KS, AD and CVM GoFTs calculated as follows. First, we obtain the values of the KS, AD and CvM test statistics (denoted ST) for true values of parameters Θ based on the sample x (1) , x (2) , ..., x (n) . In the next step we simulate 10 4 samples x (1) , x (2) , ..., x (n) from the given distribution with true values of parameters Θ. For each sample, we calculate the values of the KS, AD and CvM test statistics (denoted ST s ). Finally, the p-value is calculated as p ≈ # {i : ST s i > ST } 10 −4 . The CECK model is the best in terms of the AIC, BIC and HQIC values (see Table 6). This model has the highest p-values (see Table 7). Therefore, the CECK model fits better than the other models analyzed in this case.

Conclusions
The paper presents the easily changeable kurtosis (ECK) distribution, the special cases of which is the uniform distribution. The new distribution, for large values of the parameter p, is similar (not identical) to the normal distribution.
The ECK belongs to the family of distributions with one mode, excess kurtosis values on the finite interval, the existing continuous function p = f (γ 2 ), where p is the shape parameter. The obtained results demonstrate that the ECK distribution can be extremely useful when we want to seamlessly test the GoFT's ability to detect deviations from normality by modeling negative excess kurtosis. We revealed that considered GoFTs are unable to distinct between the negatively skewed distribution (not only slightly but even moderately) and the Normal distribution. One can tentatively say that the above conclusion relates to numerous distributions similar to the new proposition.
Student and Fisher-Snedecor tests may be applied even as population distributions are of negative excess kurtosis, slight or indeed moderate one.
Real data example demonstrates that the ECK(a, p) distribution in the mixed variant is flexible and competitive model that deserves to be added to the existing distributions in data modeling.
The information presented in the article shows that the proposed distribution deserves to be added to the symmetric distribution family.