Different Classical Methods of Estimation and Chi-squared Goodness-of-fit Test for Unit Generalized Inverse Weibull Distribution

In this paper, we try to contribute to the distribution theory literature by incorporating a new bounded distribution, called the unit generalized inverse Weibull distribution (UGIWD) in the (0, 1) intervals by transformation method. The proposed distribution exhibits increasing and bathtub shaped hazard rate function. We derive some basic statistical properties of the new distribution. Based on complete sample, the model parameters are obtained by the methods of maximum likelihood, least square, weighted least square, percentile, maximum product of spacing and Cramèr-von-Mises and compared them using Monte Carlo simulation study. In addition, bootstrap confidence intervals of the parameters of the model based on aforementioned methods of estimation are also obtained. We illustrate the performance of the proposed distribution by means of one real data set and the data set shows that the new distribution is more appropriate as compared to unit Birnbaum-Saunders, unit gamma, unit Weibull, Kumaraswamy and unit Burr III distributions. Further, we construct chi-squared goodness-of-fit tests for the UGIWD using right censored data based on Nikulin-Rao-Robson (NRR) statistic and its modification. The criterion test used is the modified chi-squared statistic Y , developed by Bagdonavičius and Nikulin (2011) for some parametric models when data are censored. The performances of the proposed test are shown by an intensive simulation study and an application to real data set


Introduction
An integral part of many statistical studies is the collection of information about the form of population from which the data is obtained. For this purpose, statisticians often use goodness of fit (GOF) tests so as to determine whether the observed sample data "fits" some proposed model. To validate the model, tests such as graphical tests, chi-squared tests, Kolmogorov-Smirnov statistic, Anderson-Darling statistic and many others are employed. The objective of these tests is to measure the distance between the observed values and the expected theoretical values. The chosen (or selected) model will be rejected when this distance is found to be greater than the critical value. The standard tables for these tests are considered to be invalid when the parameters are unknown. Further, goodness of fit tests for complete sample procedures are inappropriate in case of censored samples (see Badr (2019)).
The above cited distributions are extended form of inverse Weibull distribution and have been derived by incorporating some additional parameters to the original probability distribution. In addition, they are based on the support over positive part of the real line. At the same time, probability distributions with support on finite range play a key role in many studies. For instance, many life test experiments which cater to data on some finite range, such as data on fractions, percentages, per capita income growth, fuel efficiency of vehicles, height and weight of individuals, survival times from a deadly disease etc. are likely to lie in some bounded positive intervals (see Kumaraswamy (1980), Gomez-Deniz, Sordo, and Caldern-Ojeda (2013), Mazucheli, Menezes, and Ghitany (2018a), Mazucheli, Menezes, and Dey (2018b), Mazucheli, Menezes, and Dey (2018c), Mazucheli, Menezes, and Dey (2019)). Due to evolving problems in life testing experiments, statistician require more and more distributions with finite support.
In this paper, first we derive a new bounded distribution from the generalized inverse Weibull distribution by transformation of the type x = T 1+T , where T has the generalized inverse Weibull distribution. We obtain a new distribution with support on (0, 1), which we refer to as unit generalized inverse Weibull distribution (UGIWD). This distribution is capable of modelling increasing and bathtub shaped hazard rate. Second, we obtain maximum likelihood, least square, weighted least square, percentile, maximum product of spacing and Cramèr-von-Mises estimators for the unknown parameters of the model based on complete sample. Besides, bootstrap confidence intervals (BCIs) of the parameters of the model based on above cited methods of estimation are also obtained. Next, we construct chi-squared tests for the UGIWD when data are right censored. We use modified chi-squared statistic developed by Bagdonavičius, Levuliene, and Nikulin (2013) for some parametric accelerated failure times models. This technique has been used to validate some models like, Weibull extension accelerated failure time model (Seddik-Ameur and Wafa 2018), competing risk model (Chouia and Seddik-Ameur 2017).
The organization of this article is as follows: In Section 2, model description is provided. In Section 3, some basic properties of the model are derived. In Section 4, six different classical methods of estimation based on complete samples are discussed. Monte Carlo simulation study is carried out to compare the different methods of estimation in Section 5. The potentiality of the new model is illustrated by means of an application to real data set in Section 6. In Section 7, maximum likelihood estimates based on right censored data is discussed. Estimated Fisher information matrix is obtained in Section 8. In Section 9, test statistic for right censored data is proposed for the model. In order to study the performance of the test statistic, a simulation study is carried out based on right censored samples in Section 10. In order to confirm the practicability of the proposed goodness-of-fit test, and the usefulness of this model, one real data set is analyzed in Section 11. At the end of this paper, conclusions are given in Section 12.

Model description
If a random variable T follows generalized inverse Weibull (GIW) distribution, then X = T 1+T follows a U GIW D. The cumulative distribution function of generalized inverse Weibull (GIW) distribution is given by Thus the UGIWD with three parameters has the density function The cumulative distribution function (CDF), survival function (SF) and hazard rate function of UGIWD are, respectively given by and the cumulative hazard rate function is given by

Statistical and mathematical properties
In this section, we devoted to some statistical and mathematical properties of the UGIW distribution.

Moments and moment generating function
The moments, incomplete moments, moment generating function, skewness and kurtosis of a probability distribution are very important tools to illustrate the distribution. The nth moments of the UGIW distribution is given by x . For |x| < λ and negative integer −n, the power series holds Hence, we can write The moment generating function of the UGIW distribution can be computed as In Table 1, we have presented the expected values, variances, skewness and kurtosis of the UGIW distribution for various values of α and β. One can see from Table 1 that the the means and variances are increasing with respect to α and β but covariance (CV), skewness and kurtosis are decreasing with respect to α and β.

Quantiles, L-moments and measures of skewness and kurtosis
The characteristics of probability distribution are also measured by the quantile function like the moments. Also, the quantile function represents the distribution and can be considered as an alternative tool for data analysis, see Nair, Sankaran, and Balakrishnan (2013). Let F (Q p ; α, λ, β) be the CDF of the UGIW distribution at pth quantiles Q p . Then the pth quantile of the UGIW random variable is given by In particular, the first three quantiles, Q 1 , Q 2 and Q 3 , can be obtained by setting p = 0.25, p = 0.5 and p = 0.75 in equation (7), respectively.
An application of quantile function is the L-moments. L-moments are the linear combinations of order statistics and can be used to compute the mean, standard deviation, skewness and kurtosis of the distribution. The sth L-moment is defined by The coefficient of skewness and kurtosis based on quantiles are given by

Conditional moment and mean deviation
Here, we introduce an important lemma which will be used in the next section.
Lemma 1. Let X be a random variable with pdf given in (1) and let J n (t) = t 0 x n f (x)dx. Then we have where γ(a, x) denote the incomplete gamma function and defined by γ(a, x) = x 0 t a−1 e −t dt.
Proof. Using equation (1), we have where u = λ(1−x) x . The result follows by using Equation (3.381.8) page 346 in Gradshteyn and Ryzhik (2014) to calculate the integral in (9). The proof is complete.
The n−th conditional moments of the UGIW distribution is given by It can be expressed by using (3), (6) and (8). The same remark holds for the n−th reversed moments of the UGIW distribution and is given by An application of the conditional moments is the mean residual life (MRL). MRL function is the expected remaining life, X − x, given that the item has survived to time x. Thus, in life testing situations, the expected additional lifetime given that a component has survived until time x is called the (MRL). The MRL function in terms of the first conditional moment as where J 1 (x) can be obtained from (9) when n = 1.
Another application of the conditional moments is the mean deviations about the mean and the median. They are used to measure the dispersion and the spread in a population from the center. If we denote the median by M , then the mean deviations from the mean and the median can be calculated as (8). Also, F (µ) and F (M ) can be easily calculated from (2).

Entropy and stress-strength reliability
Entropy is useful in gathering information about the uncertainty of the random experiment. It was initially used in assessing the quality of communications. The Renyi Entropy which generalizes the Hartley and Shannon entropies which is given by The δ−entropy, say I δ (x), is defined by and then it follows from equation (10).
The stress-strength reliability has been widely used in reliability analysis as the measure of the system performance under stress. In terms of probability, the stress-strength reliability can be obtained as where X denotes strength of the system and Y denotes the stress applied on the system. The probability R can be used to compare the two random variables encountered in various applied disciplines. R for UGIW random variables (X ∼ U GIW (α 1 , λ 1 , β 2 ) and Y ∼ U GIW (α 2 , λ 2 , β 2 )) is given by

Order statistics
Let X 1 , X 2 , . . . , X n be a random sample from UGIW distribution. Let X 1:n ≤ X 2:n ≤ . . . ≤ X n:n be the order statistics from this random sample, then the pdf f r:n (x) of the r th order statistic, for r = 1, 2, . . . , n, is obtained as follows: where f(x) and F(x) are the P DF and CDF of the UGIW distribution, respectively. The PDF of rth order statistic is The kth moment of the rth order statistic is given by

Maximum likelihood estimation
Here, the parameters of the UGIW distribution are estimated by using the method of maximum likelihood. Let X 1 , X 2 , ...X n be a random samples distributed according to the UGIW distribution, then the likelihood function can be written as By taking the natural logarithm, the log-likelihood function is obtained as; Let Then the components of the score function are If we set these equations to zero and solve them simultaneously, we can compute the MLEs of the parameters α, β and λ. To solve these equations, it is usually more convenient to use nonlinear optimization methods such as quasi-Newton algorithm.

Method of ordinary and weighted least squares
The least square (LS) and the weighted least square (WLS) are well known methods used for estimating the unknown parameters (Swain, Venkatraman, and Wilson 1988). Here, we consider the two methods to estimate the unknown parameters of the U GIW distribution. Let x 1, x 2..... , x n be the ordered observations obtained from a sample of size n from the U GIW distribution., The LS and WLS estimates of α, λ and β can be obtained by minimizing the following function with respect to α, λ and β, respectively where θ = (α, β, λ). The LS estimates denoted byα LSE ,λ LSE andβ LSE and can be obtained by setting η i = 1, while we can obtain the W LS estimates denoted byα W LS ;λ W LS andβ W LS by setting η . These estimates can also be obtained by solving the following equations where u i , i = 1, 2, ....n are the order observations of u i as defined earlier.

Method of percentile
In this subsection, we estimate the unknown parameters of UGIW distribution by the percentile method. This method was first introduced by Kao (1958) for estimating Weibull parameters. Let p i = i n+1 be the estimate of F (x i , θ), then the percentile estimates of the parameters of UGIW distribution are denoted byα P E ,λ P E andβ P E and can be obtained by minimizing the following function with respect to α, λ and β or equivalently by solving the following non-linear equations

Method of maximum product of spacing
According to Cheng and Amin (1983), the maximum product of spacing (M P S) estimates of the unknown parameters of the U GIW distribution can be obtained based on the idea of differences between the values of the cdf at consecutive data points. Based on a random sample of size n from the UGIW distribution, the uniform spacings can be defined as follows where F (x, θ) is the cdf given by (2), F (x 0 , θ) = 0 and F (x n+1 , θ) = 1. The M P S estimates denoted byα M P S ,λ M P S andβ M P S can be obtained by maximizing with respect to α, λ and β or by solving the following equations where ϕ 1 (u i , θ), ϕ 2 (u i , θ) and ϕ 3 (u i , θ) are given by (13), (14) and (15).

Method of Cramér-von-Mises
The Cramér-von-Mises estimates (CMEs) denoted byα CM E ,λ CM E andβ CM E of α, λ and β can be obtained by minimizing the following function with respect to α, λ and β.

Simulation results for complete data
It is not possible to compare the performance of the differernt estimators derived in the previous sections theoretically, therefore, we conduct a Monte Carlo simulation study to determine the best estimation method among six classical estimation methods. We generate 10,000 random samples of different sample sizes and different parameter values. We replicate the process 1000 times and obtain the average of the estimates and the MSE in each case. From Tables 2-7, it is noted that the maximum likelihood method of estimation performs better than other methods in terms of MSE in most of the cases. The next best performing estimator is the PE followed by LSE in most of the cases. Finally, we noted that the MSE decreases in all the methods of estimation as the sample size increases, which indicates that the all the methods of estimation are consistent. Form Tables 2-7, it is also observed that, PE gives the least AWs in case of α while for β and λ, MLE performs better than other methods of estimation. It is also observed that MLE performs better than other methods of estimation in case of CPs in most of the cases.

Application with complete data
In this section, we provide one application to real data set to illustrate the importance of the U GIW distribution presented in Section 2. The MLEs of the model parameters are computed and goodness-of-fit statistics with rival models are compared.
The data set consists of 63 observations of the strengths of 1.5cm glass fibers taken from Smith and Naylor (1987 We divide the data by 2.24, to get the data set between 0 and 1.
We compare the fits of the U GIW distribution with some comprting models and their densities are given by: • The unit-Birnbaum-Saunders distribution (UB): • The unit-Weibull distribution (UW): The fitted models are compared using goodness-of-fit measures, namely: the maximized log-likelihood under the model (−l), Cramèr-Von Mises (CV M ), Anderson-Darling (AD), Kolmogorov-Smirnov (KS) statistic and its p-value (P V ). It is clear that the U GIW distribution fits very well the strengths of glass fibers data. Next, we obtain the estimates of the unknown parameters of the U GIW distribution using six methods of estimation and the values of −l, KS and the corresponding PV are displayed in Table 9 for strengths of glass fibers data set. The values in Table 9 reveal that the MLE method can be used to estimate the parameters of the U GIW distribution. However, all estimation methods perform well.

Maximum likelihood estimation with right censored data
Let us consider X = (X 1 , X 2 , ..., X n ) T a sample from U GIW distribution with parameter vector θ = (α, λ, β) T which can contain right censored data with fixed censoring time τ. Each X i can be written as The right censoring is assumed to be non informative, so the log-likelihood function can be written as: The maximum likelihood estimators α ,λ and β of the unknown parameters α, λ and β can be derived from the nonlinear following score equations: The explicit form of α,λ and β cannot be obtained, so we use numerical methods.

Estimated Fisher information matrix
The components of the estimated information matrix I = (î ij ) (3×3) are obtained bŷ where α, λ and β are replaced by their MLEs α, λ andβ.

Test statistic for right censored data
Let X 1 , ..., X n be n i.i.d. random variables grouped into k classes I j . To assess the adequacy of a parametric model F 0 , we consider when data are right censored and the parameter vector θ is unknown, Bagdonavičius and Nikulin (2011) proposed a statistic test Y 2 based on the vector This represents the differences between the observed and the expected numbers of failures (U j and e j ) to fall into these grouping intervals I j = (a j−1 , a j ] with a 0 = 0, a k = τ , where τ is a finite time. The authors considered a j as random data functions such as the k intervals chosen have equal expected numbers of failures e j .
The statistic test Y 2 is defined by where Z = (Z 1 , ..., Z k ) T and Σ − is a generalized inverse of the covariance matrix Σ and θ is the maximum likelihood estimator of θ on initial non-grouped data.
Under the null hypothesis H 0 , the limit distribution of the statistic Y 2 is chi-square with k = rank(Σ) degrees of freedom. The description and applications of modified chi-square tests are discussed in Voinov, Nikulin, and Balakrishnan (2013).
The interval limits a j for grouping data into j classes I j are considered as data functions and defined byâ such that the expected failure times e j to fall into these intervals are e j = E k k for any j with E k = n i=1 H u i ,θ . The distribution of this test statistic Y 2 n is chi-square (see Voinov et al. (2013)).

Criteria test for U GIW D
For testing the null hypothesis H 0 that data belong to the U GIW model, we construct a modified chi-squared type goodness-of-fit test based on the statistic Y 2 . Suppose that τ is a finite time and the observed data are grouped into k > s sub-intervals I j = (a j−1 , a j ] of [0, τ ] . The limit intervals a j are considered as random variables such that the expected numbers of failures in each interval I j are the same, so the expected numbers of failures e j are obtained as

Estimated matrixŴ
The components of the estimated matrixŴ are derived from the estimated matrixĈ which is given by: Therefore, the quadratic form of the test statistic can be obtained easily as:

Simulation results for censored data
We generated N = 10, 000 right censored samples with different sample sizes and different parameter values from the U GIW model. Using R statistical software and the Barzilai-Borwein (BB) algorithm (Ravi and Gilbert 2009), we calculate the averages of the simulated values of maximum likelihood estimates of the unknown parameters and their corresponding mean squared errors (M SEs). The results are presented in Table 10. From Table 10, we can notice that the mean squared errors are very small, which confirms the convergence of the maximum likelihood estimators. In order to study the performance of the test statistic proposed in this work, a simulation study has been carried out. Thus, for testing the null hypothesis H 0 with respect to sample belongs to U GIW distribution, we draw 10, 000 samples data from U GIW model with different sample sizes and different parameter values to calculate Y 2 statistic. Then, we compute the number of cases of rejection of the null hypothesis H 0 , when the values of criteria statistic Y 2 are superior to χ 2 (k) (the quantile of the chi-square distribution with k degrees of freedom). We give a comparison between the different theoretical values of significance level (with = 0.10, = 0.05, = 0.01) and their simulated levels (empirical levels) of significance in T able 11. As can be seen, the values of the calculated empirical levels of Y 2 test are very close to those of their corresponding theoretical levels of the chi-squared distributions with k degrees of freedom. Thus, we conclude that the proposed test is well suited to the U GIW distribution.
We use the test statistic provided above to verify whether the above data set can be modeled by U GIW distribution , and at this end, we first calculate the maximum likelihood estimators of the unknown parameterŝ θ = α,λ,β T = (2.1865, 5.623, 3.2152) T .
To calculate the test statistic Y 2 n , we need the following results (see Table 12). We choose k = 5 grouping intervals of I j .
So, we obtain the value of Y 2 n as Y 2 n = X 2 + Q = 4.1387 + 1.9568 = 6.0955 For significance level ε = 0.05, the critical value χ 2 5 = 11.0705 is greater than the value of Y 2 n = 6.0955, so we can say that the proposed model U GIW fit these data. We also calculated the test statistic Y 2 n to fit the data set to the competing models. The results are given in Table 13.

Concluding remarks
In this study, a new bounded distribution has been introduced in the (0, 1) intervals by transformation method which provides better fits than unit-Birnbaum-Saunders, unit-Weibull, Unit-Gompertz, Unit Burr-III and Kumaraswamy distributions. Some statistical properties has been derived. The unknown parameters of the UGIW distribution are estimated by six different frequentist methods of estimation and obtained their CIs. The practical applicability of the UGIW distribution has been illustrated by means of one real-life data application. Next, we provide the formulae of the criteria statistic of the modified chi-squared goodness-of-fit test for UGIW model when data are right censored and the parameters are unknown. The statistic Y 2 can be used to check the validity of the UGIW model. The main advantage of the chi-square goodness-of-fit tests for censored data is that the limiting distribution of these statistics is the well-known χ 2 distribution. We hope that the results obtained through this study will be useful for practionnars in several fields. The performances of the results and the effectiveness of the proposed test are shown by simulation study and real data analysis.