Estimations of the Parameters of Generalised Exponential Distribution under Progressive Interval Type-I Censoring Scheme with Random Removals

The present article aims to point and interval estimation of the parameters of generalised exponential distribution (GED) under progressive interval type-I (PITI) censoring scheme with random removals. The considered censoring scheme is most useful in those cases where continuous examination is not possible. Maximum likelihood, expectationmaximization and Bayesian procedures have been developed for the estimation of parameters of the GED, based on a PITI censored sample. Real datasets have been considered to illustrate the applicability of the proposed work. Further, we have compared the performances of the proposed estimators under PITI censoring to that of the complete sample.


Introduction
In Statistical literature several authors have proposed models which are supposed to be competing models (see, Mudholkar and Srivastava (1993), Kondolf and Adhikari (2000), etc.) to Gamma and Weibull distributions.Similarly, Gupta and Kundu (2001a) introduced the generalised exponential distribution (GED) as an alternative to Gamma and Weibull distributions.Nowadays, its has gained popularity in the statistical literature due to its simplicity, and the probability density function (pdf) is very flexible and accommodate wide variety of shapes.The probability density function of the GED is given as, where α is the shape parameter and θ is the scale parameter of the considered model.Its cumulative distribution and survival functions are given by, respectively.It has been extensively studied by Raquab and Madi (2005); Singh, Singh, Singh, and Prakash (2008) and many others.Gupta and Kundu (2001a); Jaheen (2004); Sarhan (2007); Zheng (2002) discuss its importance over gamma and Weibull distribution which are two most popular distribution used in survival analysis.Gupta and Kundu (2001a) noted that, in many situations, the two-parameter generalised exponential distribution provides a better fit than the two-parameter Weibull distribution.It may be noted here that the GED is a special case of a distribution that was used by Gompertz (1825).Gupta and Kundu (2001b) studied different methods of point estimation for GED parameters which include maximum likelihood estimation, method of moment estimation and probability plot method of estimation based on complete samples.Singh, Singh, and Kumar (2011) discussed the parameter estimation and reliability characteristic of GED under Bayesian paradigm.It is worthwhile to mention here that little attention has been paid to inferences based on censored samples from GED under the Bayesian paradigm, although censoring is quite common in various clinical and life testing experiments.
Situations do arise when the units under study are lost or removed from the experiments while they are still alive i.e., we get censored data in such cases.If the point at which the experiment terminate is time dependent, it is called Type-I censoring.On the other hand, if it is unit dependent, it is called Type-II censoring.Depending on the need and practical considerations, various modified forms of censoring schemes have been discussed in the literature.Aggarwala (2001) proposed a combination of interval Type-I censoring and progressive censoring called progressive Type-I (PTI) interval censoring which naturally arises in most clinical experiments.To have a clear visualization of this censoring scheme, let us consider an experiment with n bladder cancer patients for whom remission times are to be recorded.The patients are called for regular check-ups at scheduled times, and those who turn up are checked.At the first visit, scheduled at time T 1 , only n − R 1 patients out of the total n patients report, i.e.R 1 patients leave the experiment during the time interval (0, T 1 ].The experimenter examines these n − R 1 patients and finds that cancer has reoccurred in D 1 patients.It may be noted here that the exact time of recurrence for these D 1 patients is not known to the experimenter; he only has the information about the number of recurrences during the time period between the start of the experiment and first visit.At the second visit, scheduled at time R 2 patients leave the experiment at this stage (during the time interval (T 1 , T 2 ]).The experimenter examines these patients and finds that cancer has reoccurred in D 2 patients out of remaining n − R 1 − D 1 − R 2 patients, and in this way the experiment continues till the m th visit.At this stage (m th visit) all the remaining units are removed, i.e. the experiment is terminated at this stage.Recently Chen and Lio (2010) proposed a methodology to estimate parameters involve in GED under PTI interval censoring under the assumption that the proportions (p i ) of the patients leaving the experiment during (T i−1 , T i ] is known in advance, i.e. they prefixed the proportions p 1 , p 2 , . . ., p m and considered that at i th stage, n i * p i patients shall leave the experiment.Here, n i * p i denotes the largest integer less than or equal to n i * p i .The author's claim that exactly n i * p i patients out of n i will drop out of the experiment at the i th stage (visit), seems unrealistic and hypothetical.In fact, the number of patients dropping out from the clinical trial at any stage is beyond the control of the experimenter and cannot be predetermined.It seems more logical and natural to consider these p i as random variables for the risk of dropping at the i th stage.Perhaps, keeping a similar thought in mind, Yuen and Tse (1996) and Tse, Yang, and Yuen (2000) discussed progressive censoring scheme with binomial removal.Ashour and Afify (2007) have used PITI censoring scheme with binomial removals assuming that the exact value of the lifetimes of the units are observable.In their studies, they have assumed that the number of removals R , i s at the i th stage (i = 1, 2, . . ., m) is random and follows the binomial distribution with probability p i .Thus, R 1 ∼ Binomial (n, p 1 ) and R 2 ∼ Binomial (n − D 1 − R 1 , p 2 ).In general, the number of units dropping at the i th stage, R i follows the binomial distribution with parameters (n In this paper, we will consider PITI censored data with binomial removals and develop estimators for the shape and scale parameter under the situation that the exact value of the lifetimes of the units not observable, only the number of units lying in the specified interval of times are known.For the parameter estimation problem, we have considered the most popular loss function, namely the squared error loss function (SELF) which can be easily justified on the grounds of minimum variance unbiased estimation (see Berger 2013, Ch.2).We will compare the performance of the proposed estimators of the parameters obtained under the above stated censoring scheme with the estimates under the complete sample case.
The rest of the paper is organized in the following sections.In Section 2, Classical and Bayes procedures for the estimation of the model parameters based on PITI with binomial removal samples have been developed.Two real datasets has been considered, the first one is related to the survival time of patients with plasma cell myeloma and the second one regarding the number of revolutions in million before failure of groove ball bearings, have been considered for the illustration of the proposed methodology in Section 3. Comparison of the estimators based on simulation study has been provided in Section 4. Finally, conclusions have been summarized in Section 5.

Maximum likelihood estimation
In this section, we provide the MLEs of α and θ, the parameters of the lifetime distribution given in equation ( 1).Let us consider that n units are put on test initially at time T 0 = 0, and we record the number of droppings and number of failures during pre-specified time intervals (T i−1 , T i ] (i = 1, 2, . . ., m) amongst the available units; i.e. we get the data consisting of the number of failures D = (d 1 , d 2 , . . ., d m ) and number of droppings R = (r 1 , r 2 , . . ., r m ) during the time intervals (0, T 1 ], (T 1 , T 2 ], ..., (T m−1 , T m ] through the censoring scheme described in the previous section.It may be noted here that the individual units dropping from the test at Therefore, the number r 1 of units dropping at the 1 st stage follows a binomial distribution with parameters (n, p 1 ) and and in general (d j − r j ).Now the complete likelihood for the observed data can easily be written as Above expression bifurcates as where Note that L 2 (•) is free from α and θ.Thus, to compute ML estimate of α and θ, we require only L 1 (•).The corresponding log likelihood function can be written as Hence, the likelihood equations can be obtained as; The MLEs of α and θ can be obtained by solving (8) and ( 9) simultaneously.But it may be noted here that explicit solutions cannot be obtained from the above equations.Thus, we propose the use of a suitable numerical technique to solve these two non-linear equations.One may use Newton-Raphson or simulated annealing of their variants to solve these equations.This can be routinely done using R or other packages.We have also obtained the observed information matrix, where, all the second partial derivatives of the log-likelihood function L αα , L αθ , L θα and L θθ are provided in the Appendix-A.Based on it, the asymptotic confidence (AC) interval and standard errors of the parameter estimates can be obtained in the usual way.While using the Newton-Raphson algorithm (the details are provided in the simulation section) to compute the MLEs for the parameters, it is observed that the iterations converge approximately 85%−90% of the time.

Bayesian estimation
In this section, we provide the Bayesian inferences for α and θ, when we have the progressive interval type-I censored data as explained in Figure 1.We have also obtained the highest posterior density (HPD) intervals for both the parameters.Before proceeding further, we make selections for the prior distributions of the parameters.Following Berger and Sun (1993); Raquab and Madi (2005); Singh, Singh, and Kumar (2014), it is assumed that both α and θ are independent gamma variates, having pdfs and Here, all the hyperparameters λ 1 , ν 1 , λ 2 and ν 2 are assumed to be known and can be evaluated following the method suggested by Singh, Singh, and Kumar (2013).We compute the Bayes estimate of the unknown parameters under the squared error loss function.Using the priors given in ( 11) and ( 12) and the likelihood function (4), the joint posterior density of α and θ for the given data can be written as where Let h(•) be a function of α and θ.Then, the Bayes estimator of h(•) under the squared error loss function is given by It is clear from the expression (13) that there is no closed form for the estimators, so we suggest using an MCMC procedure to compute the Bayes estimates.After getting MCMC Estimations of the Parameters under Progressive Interval Type-I Censoring Scheme samples from the posterior distribution, we can find the Bayes estimate for the parameters in the following way where N 0 is burn-in period of the Markov chain and Θ i = [α i , θ i ] .For computation of the highest posterior density (HPD) interval of Θ, order the MCMC sample of Θ as Θ ).Finally, the HPD credible interval of α and β is that interval which has the shortest length.In order to obtain the MCMC samples from the joint posterior density of α and θ, we use the Metropolis-Hastings (M-H) algorithm.We consider a bivariate normal distribution as the proposal density i.e.N 2 (µ, Σ) where Σ is the variance-covariance matrix.It may be noted here that if we generate observations from the bivariate normal distribution, we may get negative values also, which are not possible as the parameters under consideration are positive valued.Therefore, we take the absolute value of the generated observations.Following this, the Metropolis-Hastings algorithm associated with the target density π(•) and the proposal density N 2 (µ, Σ) produces a Markov chain Θ i through the following steps.
4 Draw u from uniform(0,1); In using the above algorithm, the problem arises as to how to choose the initial guess.Here, we propose the use of the MLEs of (α, θ), obtained by using the method described in Section 2.1, as initial values for the MCMC process.The choice of covariance matrix Σ is also an important issue; see Natzoufras (2009) for details.One choice for Σ would be the asymptotic variance-covariance matrix I −1 (α, θ).While generating M-H samples by taking Σ = I −1 (α, θ), we noted that the acceptance rate for such a choice of Σ is about 15%.By acceptance rate, we mean the proportion of times a new set of values is generated at the iteration stages.It is well known that if the acceptance rate is low, a good strategy is to run a small pilot run using a diagonal Σ as a rough estimate of the correlation structure for the target posterior distribution and then re-run the algorithm using the corresponding estimated variance-covariance matrix; for more details see Gelmen, Carlin, Stern, and Rubin (1995, pp. 334-335).Therefore, we have also used the latter described strategy for the calculations in the following sections.

Real data application
In this section, we illustrate our proposed methodology with the real examples.The first dataset considered by us represents the survival times for patients with plasma cell myeloma, already reported in Carbone, Kellerhouse, and Gehan (1967).The data contains the response time to therapy of 112 patients with plasma cell myeloma (a tumour of the bone marrow composed of cells normally found in bone marrow) treated at the National Cancer Institute, Bethesda, Maryland.Figure 2 represents the contour plot of negative log-likelihood for the Every inner ellipse has a smaller value than that of the outer ellipse.Thus, the innermost ellipse has the minimum value.In other words, the minimum of the minus log-likelihood (maximum of the likelihood) will correspond to the innermost ellipse.
We used an arbitrary point (1,0.05)from this innermost ellipse as an initial guess.The MLEs for the dataset are then calculated, using the procedure explained in Section 2.1.Finally, these are obtained as αML = 1.4325, θML = 0.0571.Similarly, a 95% asymptotic confidence intervals for α is obtained as (0.9706, 1.8944) and for β as (0.0420, 0.0727).
To compute the Bayes estimates for the considered dataset, we used the MCMC technique discussed in Section 2.2.Following Robert (2015), we ran three MCMC chains with initial values selected as MLE, MLE -(asymptotic standard deviation) and MLE + (asymptotic standard deviation), respectively.Figure 3 shows the iterations and density plot of samples generated from the posterior distribution using the MCMC technique.From this figure, we see that all the three chains have converged and are well mixed.It is further noted that the posterior of α is approximately symmetric, but the posterior of θ is left skewed.Utilizing these MCMC samples, we computed the Bayes estimates, following the method discussed in Section 2.2, and got αB = 1.4301, θB = 0.0581 under non-informative independent priors.The 95% highest posterior density (HPD) interval estimate for α is obtained as (1.0001,1.6109)and for θ as (0.0424, 0.0719).
The second dataset, considered here, arose in the tests on the endurance of deep groove ball bearings.This data contains the number of million revolutions before failure for each of the 23 ball bearings in the life test and has been reported by (Lawless 2002, pp.228).The data points are exact observations.For the illustration of our methodology, we have generated censored data for a prefixed number of inspections by specifying the inspection times and dropping probabilities.
We fixed the experimentation time as 140 units of time and decided to have 7 inspections during this period.We have considered four different inspection plans.The first plan consists of equally spaced inspection times i.e. at 20, 40, . . ., 140 units of time.The next inspection plan is designed under the motivation that if the probability of failure is high during some time interval, an early inspection should be scheduled.Thus, the second inspection plan is based on such a notion.The third inspection plan is designed on the basis of estimated cdf; although such a plan is not feasible in practice we have included it for theoretical interest.First, we calculate u = F (140, α M L , θ M L ), then inspection times are obtained as ) and T 7 = 140.The fourth inspection plan is chosen so as to have approximately equal probability of failure in each interval of inspection and are approximated to the nearest multiple of 10.The dropping schemes, are selected in the following manner: the first scheme considers the risk of dropping at all the intermediate stages to be zero i.e. p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = p 7 = 0, p 8 = 1.In the second scheme, the risk at all stages is equal but not to zero i.e. p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = p 7 = 0.2, p 8 = 1.The third scheme is constructed so that the risk of dropping is low in the earlier stages and high in later stages.Contrary to it, in the fourth scheme, the risk of dropping is high in earlier stages and low in the later stages.Lastly, we consider the case when the risk is high at the first stage, but there is no risk at all other stages.These inspection schemes and dropping schemes are summarized in Table 1b and Table 1a, respectively.Under dropping scheme 1 and inspection scheme A, we obtained the number of failures at seven stages as 1, 2, 8, 4, 3, 2, 2 respectively and one dropping at last stage.Following the same procedure, as followed in the previous example, we calculated the ML estimates and Bayes estimates with corresponding interval estimates for the dataset as mentioned above.This result is summarized in the first row of Table 2.The last row of the table provides the ML and Bayes estimates with corresponding interval estimates for complete dataset.
It may be worthwhile to mention here that the number of droppings are random and we are generating the progressive interval type-I censored data from the complete sample data, therefore we can study the average performance of the estimators.For this purpose, we generated 2000 censored datasets of r i 's for given p i 's and accordingly the d i 's from the considered complete dataset.Table 2 provides the average ML and Bayes estimates, along with the AC and HPD interval estimates of the parameters based on the generated censored datasets.It may be seen from this table that the width of the interval estimates under dropping scheme 1, when risk of dropping at all stages is zero, is least of all the estimators under other schemes.It may further be seen that width of the interval estimates under dropping scheme 2 is more than the others.Further, under the 4 th scheme the interval width is lesser than those under the 3 rd scheme.While studying the effect of inspection time on the performance of the estimators, we noted that the average estimate under inspection scheme A and dropping scheme 1 is close to the estimate obtained for the complete sample case.For other inspection and dropping schemes the average estimates are larger than that obtained for the complete sample case.Similarly, the average width of the interval estimates under scheme A is least among all considered inspections schemes.The width of the interval estimates under scheme B is more than those under scheme A but less than those under scheme C. The width of the interval estimates under scheme D is largest.It is also noted that as the proportion of droppings increases, the width of the interval estimates increase.

Simulation study
In this section, we have compared the performances of the various estimators on the basis of their bias and mean square error (MSE).It may be mentioned here that the exact expressions for the bias and mean square errors cannot be obtained, because the estimators are not in closed form.Therefore, biases and MSEs are estimated on the basis of a Monte-Carlo simulation study of 2000 samples.For this purpose we generated a specified number of observations from the distribution given in equation ( 1) for arbitrarily fixed values of the parameters under the specified censoring schemes and calculated different estimates of α and θ following the procedure described in the previous sections.This process was repeated 2000 times to obtain the simulated biases and MSEs.We have computed the MLEs by using the Newton-Raphson algorithm.The estimates of (α, θ) obtained through the Newton-Raphson algorithm are denoted as (α M L , θ M L ), respectively.It is noted that Newton-Raphson algorithm has a convergence rate of 85%-90%.We have reported the results omitting the cases where the algorithms do not converge.To simulate a progressive interval type-I censored sample from the considered distribution, we have used the algorithm given by Balakrishnan and Cramer (2014, pp.200) after modifying step 4 as : Determine the number of droppings at the j th stage by generating r j from Bin(n − i−1 j=1 (d j + r j ), p j ).It may be noted here that the MSE and bias of these estimators will depend on the sample size n, values of α, θ and hyperparameters λ 1 , λ 2 , ν 1 and ν 2 .We considered a number of values for the sample size n; namely n = 20, 30, 40 and 50.For the choice of the hyper-parameters of the prior distribution, we have considered one set of values as λ 1 = λ 2 = ν 1 = ν 2 = 0 which reduces the prior to a non-informative prior.For an informative prior, the hyperparameters are chosen on the basis of the information possessed by the experimenter.In most cases, the experimenter can have a notion of what are the expected value of the parameter and can always associate a degree of belief to this value.In other words, the experimenter can specify the prior mean and prior variance for the parameters.The prior mean reflects the experimenter's belief about the parameter in the form of its expected value, and the prior variance reflects his confidence in this expected value.Keeping this point in mind, we have chosen the hyper-parameters in such a way that the prior mean is equal to the true value of the parameter, and the belief in the prior mean is either strong or weak, i.e. the prior variance is small or large, respectively; for details see Singh et al. (2011).The bias of the estimates of parameters, reliability and hazard rate with corresponding MSEs have been calculated, and the results are summarized in Table 3, 4 and 5.
Table 3 provides the absolute bias and MSE of estimates of the parameters along with the reliability and hazard rate at time t = 1 for α = 2.5, θ = 2 and inspection times 0.2, 0.4 , 0.6, 0.8, 1.0, 1.2, 1.4, 1.6.It can be seen from the table that in general the bias and MSEs decrease as n increases in all the considered cases.It can also be seen that the MSE of the MLE is more than that of the corresponding Bayes estimate in all cases, but the difference between the MSEs of the Bayes and ML estimates decreases for increases in the value of n.It is noted here that bias of the estimates and MSEs under censoring scheme 1 are approximately equal to that of complete sample case (denoted as scheme 0) and smaller than those under other schemes.In most cases it is observed that the bias and MSE under dropping scheme 1 are smallest followed by scheme 5, 4, 3 and 2 sequentially.Bias and MSE of the reliability estimate show a similar trend as observed for the parameter estimates.
Table 4 provides the absolute bias and MSE of the various estimators for different choices of model parameters.Above we noted that as the sample size increases the Bias and MSE decrease, therefore we have reported the results for n=30 only.Similarly, we noted above that under dropping scheme 1 the performance of the estimates are as good as the complete sample case and better than all other schemes.Therefore, we have reported the results for the complete sample case and scheme 1 and scheme 4 only.It may be seen from the table that the bias and MSE of all the considered estimates of α, θ, reliability S M L (t = 1) and hazard rate H M L (t = 1) increase as α increases and/or as θ increases.It is interesting to note that the bias and MSE of all the estimates are smaller when the proportion of droppings are smaller.All the estimates under scheme 1 have, more or less, a similar bias and MSE as that obtained for the complete case; but the bias and MSE of the estimates under scheme 4 are a little larger.The bias and MSE of the Bayes estimates obtained using various priors are presented in Table 5, and we see that as prior confidence in the guessed value increases the MSE decreases.

Conclusions
In the present piece of work, we have considered both Classical and Bayesian analysis for the progressive interval type-I censored data when the lifetime of the items follows generalised exponential distribution.The ML estimates do not have explicit forms.Therefore, the Newton-Raphson algorithm has been proposed to compute the MLEs.The Bayes estimates under the squared error loss function also do not exist in explicit form, but, Bayes estimates can be routinely obtained through the use of MCMC technique considering the shape and scale parameters having independent gamma priors.On the basis of this study, we may conclude that the proposed estimation procedures under progressive interval type-I censoring with specific choices of the scheme can be easily implemented.It is also seen that the inspection scheme and dropping schemes have an effect on the performance of the estimators.Thus, if it is possible, it is better to choose a scheme resulting in a fewer number of droppings.However, in most practical situations the dropping scheme is not controllable.Therefore, in such situations, the inspection plan should be designed as to result in the least number of droppings.However, under any scheme, the proposed method can be used to obtain the estimates.
We have not considered any covariates in this paper, but in practice often the covariates may be present.It will be interesting to develop statistical procedures for the estimation of the unknown parameters in the presence of covariates.Further, we have considered dropping probabilities at each stage to be fixed, but in real life, these may be random, and a suitable model to capture this randomness can be developed.The work in this direction is under progress.

Figure 1 :
Figure 1: Progressive interval type-I censoring scheme

Figure 2 :
Figure 2: Contour plot for plasma cell myeloma data

Figure 3 :
Figure 3: Iteration trace and density plot of MCMC samples for plasma cell myeloma data

Table 3 :
Simulated bias (MSE) of estimates of parameters, reliability and hazard rate for fixed α = 2.5, θ = 2 and inspection time 0.2(0.2)1.6.Here, true value of reliability at time 1 is S(1) = 0.3048b True value of hazard rate at time 1 is H(1) = 1.7851 c 0 means complete case, when no dropping and data points collected continuously a

Table 4 :
Simulated bias (MSE) of estimates of parameters, reliability and hazard rate for various choice of parameters and fixed n = 30