Power Modiﬁed Lindley Distribution: Properties, Classical and Bayesian Estimation and Regression Model with Applications

In this article, we explore a new probability density function, called the power modiﬁed Lindley distribution. Its main feature is to operate a simple trade-oﬀ among the generalized exponential, Weibull and gamma distributions, oﬀering an alternative to these three well-established distributions. The proposed model turns out to be quite ﬂexible: its probability density function can be right skewed and its associated hazard rate function may be increasing, decreasing, unimodal and constant. First the model parameters of the proposed distribution are obtained by the maximum likelihood method. Next, Bayes estimators of the unknown parameters are obtained under diﬀerent loss functions. In addition, bootstrap conﬁdence intervals are provided to compare with Bayes credible intervals. Besides, log power modiﬁed Lindley regression model for censored data is proposed. Two real data sets are analyzed to illustrate the ﬂexibility and importance of the proposed model.


Introduction
Recent studies show a spur in the efforts of constructing new univariate distributions and these efforts are mostly guided by the theoretical considerations or practical applications or both. It is essential that any probability distribution developed must reflect the data accurately. Researchers often adopted different classical distributions for modeling and illustrating real life data arising in different fields such as economics, finance, engineering, biology, insurance, medical sciences, etc. However, it has been very often observed that many of these distributions are unable to fit some of the real data sets accurately. Hence, there is need to identify distributions which can be applied for modeling the data accurately. The efficacy of any statistical inference with respect to data sets mainly depends on the knowledge of appropriate distribution. Therefore, there is a need for extending existing classical distributions, so as to enhance their flexibility, adaptability and goodness-of-fit in modeling data accurately. In this paper, a new modified distribution with increased flexibility for fitting data has been proposed. In literature, we come across several approaches for extending and developing of some well-known distributions into generalized classes of distributions. For greater details on various approaches in generating distributions, one may refer to Abd-Elfattah (1987).
The one-parameter Lindley distribution (LD) introduced by Lindley (1958) in the context of fiducial and Bayesian statistics is in fact an amalgamation of exponential and gamma distributions. It became popular in statistical literature only after Ghitany, Atieh, and Nadarajah (2008) studied some of its properties and application. Although this distribution is not suitable for non-monotone failure rate data but it is capable of modeling an increasing hazard rate function. The statistical literature abounds in many extended forms of Lindley distribution covering a wide range of shapes of hazard rate function including the unimodal ones. However, these extended forms usually involve two to five parameters and hence complexity arises for statistical inference when the number of parameters involved in these extended distribution are more than two. Applying different formulation, several authors extended Lindley distribution. Some of the extended work includes: Ghitany, Alqallaf, Al-Mutairi, and Husain (2011) proposed Weighted Lindley (WEL) distribution, generalized Lindley distribution (GL) was proposed by Nadarajah, Bakouch, and Tahmasbi (2011), extended Lindley distribution by exponentiation was proposed by Bakouch, Al-Zahrani, Al-Shomrani, Marchi, and Louzada (2012), Barreto-Souza and Bakouch (2013) proposed exponential Poisson Lindley (EPL) distribution. Ghitany, Al-Mutairi, Balakrishnan, and Al-Enezi (2013) proposed power Lindley (PL) distribution. Asgharzadeh, Bakouch, Nadarajah, and Sharafi (2016) proposed a new weighted Lindley distribution (WL). Joshi and Jose (2018) proposed Wrapped Lindley distribution and so on. An interesting work with Lindley distribution was carried out recently by Chesneau, Tomy, and Gillariose (2021) wherein a new modified Lindley distribution was developed without considering any special function or additional parameters in its formulation. They showed that their proposed model provides better fit than exponential and Lindley distributions and it was suitable for modeling increasing, reverse bathtub (unimodal) and constant shaped hazard rate function.
The first objective of this paper is to introduce a new lifetime distribution called the power modified Lindley (PML) distribution and derive some of its properties. Modified Lindley distribution is a special case of the PML distribution when α = 1. The proposed distribution provides better fits than some well known lifetime distributions. The importance of the new distribution is the ability of describing decreasing, increasing and unimodal shaped hazard rate functions which is extensively used in many real life data. Several authors have discussed the situations where the data shows decreasing, increasing and unimodal shape hazard rates ( See, Proschan (1963), Bennette (1983), Efron (1988), Langlands, Pocock, Kerr, and Gore (1997), Kus (2007), Woosley and Cossman (2007), Koutras (2011), Bhati, Sastry, andQadri (2015), Abdi, Asgharzadeh, Bakouch, and Alipour (2019)). Besides, PML is a suitable model for fitting positively skewed data which may not be adequately modelled by many other distributions. Thus, it can be used to fit data related to public health, biomedical studies, industrial reliability, survival analysis and several other areas. Second objective is to estimate the unknown parameters of the model from both frequentist and Bayesian points of view for different sample sizes and different parameter values. In case of frequentist method, we consider maximum likelihood method while for Bayesian approach, we consider independent gamma priors for the model parametrs and five different loss functions (symmetric and asymmetric loss functions), namely squared error loss function (SELF); weighted squared error loss function (WSELF); modified squared error loss function (MSELF); precautionary loss function (PLF) and K-loss function (KLF). In addition, parametric and non-parametric bootstrap confidence intervals using frequentist approaches are provided to compare with Bayes credible intervals. To evaluate the performance of the estimators, a simulation study is carried out. Finally, two real life data sets have been analyzed for illustrative purposes. Third objective is to obtain the MLEs of the log power modified Lindley regression model for censored data to show the flexibility of the log power modified Lindley regression model. Thus far we have not come across any report on estimation of parameters and construction of two bootstrap confi-dence interval (CI) and highest posterior density (HPD) credible intervals for the considered distribution along with regression model for censored and uncensored data.
The rest of the paper is organized as follows. In Section 2, we introduce the power modified Lindley distribution. In Section 3, we discuss some of its properties including a mixture representation. In Section 4, the estimation of the model parameters and confidence intervals/credible intervals by the method of maximum likelihood, parametric and non-parametric bootstrap methods and Bayesian method are presented. In the same Section, we perform a simulation study to evaluate the performance of the aforementioned estimation methods. In Section 5, log power modified Lindley regression model for censored data is presented. Two real data sets are analyzed and presented in Section 6. Finally, we conclude the paper in Section 7.

Model description
The one parameter modified Lindley (ML) distribution proposed by Chesneau et al. (2021) with cumulative distribution function (cdf) Now, we introduce a skewness parameter to the modified Lindley distribution using a similar idea to Ghitany et al. (2013) i.e., Y = X α , α > 0 and to obtain a power modified Lindley (PML) distribution. The cdf of the two parameter power modified Lindley distribution is given by and the corresponding probability density function (pdf) and hazard functions (hrf) are given, respectively, by and When α = 1, the PML distribution reduces to ML distribution Chesneau et al. (2021).
Plots of the pdf and hrf of X, respectively, of the PML distribution for selected parameter values of α and θ are shown in Figure 1. Figure 1 shows that the PML density can be unimodal, positively skewed and approximately symmetric, whereas the hazard rate function of the PML distribution can be decreasing, increasing, unimodal and constant shapes. An advantage of the definition of f (x) is that we can write it as a linear combination of Weibull distribution and generalized gamma distribution as where p = 1 2(1 + θ) , From (4), we see that the PML distribution is a linear combination of Weibull distribution (with shape parameter α and scale parameter θ), a generalized gamma distribution (with shape parameters 2, α and scale parameter 2θ) and a Weibull distribution (with shape parameter α and scale parameter 2θ), with p = 1 2(1+θ) . This representation will be useful to determine the mathematical properties of the PML distribution.

Statistical and mathematical properties
Some mathematical properties of the PML distribution are presented in this section.

Asymptotics
In this section, we study the asymptotics of the pdf and hrf of the PML model. The behavior of f (x) at x = 0 and x = ∞, respectively, are given by The behavior of h(x) at x = 0 and x = ∞, respectively, are given by and

Moments and moment generating function
Let X be a random variable from PML distribution with pdf given in (2), then its moments is given by the following By using well known properties of the gamma function, for any positive integer r, For any t < θ, the moment generating function of PML distribution can be computed as The characteristic function of PML distribution, φ X (t) = E(e itx ), and the cumulant generating function of X, K(t) = log φ X (t), are given by and The central moments µ r and cumulants k r of X can be determined from equation (5) as 1 , etc. The skewness γ 1 = k 3 /k 3/2 2 and kurtosis γ 2 = k 4 /k 2 2 can be calculated from the second, third and fourth standardized cumulants.

Conditional moment, mean deviation, mean residual life and Bonferroni and Lorenz curves
For the PML distribution, it can be easily seen that the conditional moments E[X n |X > t], can be written as An application of the conditional moment is the mean residual life (MRL). Thus, the MRL function in terms of the first conditional moment is given by where µ 1 (t) can be obtained from (6) when n = 1.
Another application of the conditional moment is the mean deviations about the mean and the median. Let us denote the mean (µ) and median by M , then the mean deviations from the mean and the median can be obtained as respectively. Where µ 1 (µ) and µ 1 (M ) can obtained from (6). Also, F (µ) and F (M ) are easily calculated from (1).
The Bonferroni and Lorenz curves are proposed by Bonferroni (1930), these curves are used in several field like economics to study income and poverty, reliability, demography, insurance and medicine. These curves are defined by . The Bonferroni and Gini indices are defined by respectively. If X has the pdf in (2), then one can obtain Bonferroni curve of the PML distribution as and the Lorenz curves L(p) = pB(p).

Entropy and stress-strength reliability
Entropy is useful in gathering information about the uncertainty of the random experiment. Thus, the Renyi entropy of the PML distribution is given by The γ−entropy, say I γ (x), is defined by and then it follows from equation (7).
The stress-strength reliability has been widely used in reliability analysis as the measure of the system performance under stress. If X ∼ P M L(α 1 , θ 1 ) and Y ∼ P M L(α 2 , θ 2 ), then stress-strength reliability R is given by

Order statistics
Let X 1 , X 2 , · · · , X n be a random sample of size n from the PML distribution and X (1) , X (2) , · · · , X (n) be the corresponding order statistics. The probability density function of the rth order statistics is obtained as follow: For the PML distribution, the pdf of r th order statistic is obtained as The r th ordered moment is obtained as

Maximum likelihood estimation
Let x 1 , x 2 , · · · , x n be a random sample of size n from the PML distribution. Then, the likelihood function is given by The corresponding log-likelihood function is The maximum likelihood estimates of θ and α can be obtained by solving the following nonlinear equations: It is seen from the above non-linear equations that there are no closed form for the MLEsα andθ. Numerical procedures may be used to solve these nonlinear equations.

Bootstrap estimation
Here, we introduce two types of bootstrap estimation procedures. (see Efron and Tibshirani (1994)): Parametric type: • Assessment ψ vector, sayψ, employing the MLE conduct by means of a given sample.
• Extract sample {Y * 1 , . . . , Y * k } making useψ and getting the bootstrap estimate of ψ, say ψ * , from the current bootstrap sample based on MLE method.

Nonparametric type
• Extract a bootstrap sample {y * 1 , . . . , y * k } with replacement strategy from the initial data.
• Calculate the bootstrap approximation of ψ from the MLE method, say ψ * , making use the bootstrap sample.

Bayesian estimation
As a powerful and valid alternative to classical estimation, the Bayesian approach suggests a procedure to combine the observed information with the prior knowledge. Here, for the purpose of framing the Bayesian analysis, we set assumptions as: We now consider several (symmetric and asymmetric) loss functions (LS), namely, SELF, WSELF, MSELF, PLF and KLF. These loss functions with corresponding Bayesian estimators (BS) and posterior risks (PR) are provided in Table 1.
In the case of PML distribution, the exact joint posterior PDF of parameters α and θ, is given by where Υ = (α, θ) and K is normalizing constant and is given by Consequently, the marginal posterior P DF for the elements of vector Υ with Υ = (Υ 1 , Υ 2 ) = (α, θ), is given by where i, j = 1, 2, i = j and Υ i is the ith element of vector parameter Υ.

Point estimation in Bayesian framework
The Bayesian point estimation of parameter vector Υ, is provided in regard to the framework of the formulation in the previous subsection. Due to intractable integral form in (12), we use Gibbs sampler technique (Geman and Geman (1984)) to extract posterior samples. This will be argued in details in subsection 4.8.
It is evident from (10) that it is difficult to get the explicit form of the marginal PDFs. Therefore, we make use of the Gibbs sampler strategy in order to extract posterior samples. For a given posterior sample with size T , S = (Υ 1 , ..., Υ T ) (where Υ i = (Υ i 1 , Υ i 2 )), from the joint posterior PDF in (10), the marginal posterior PDFs of Υ j given x is where the Υ i −j is a vector of posterior sample that jth observation is removed. Inserting (14) in (13), it gives us the possibility to compute the credible intervals for Υ j , j = 1, 2 as follows

Highest posterior density interval
A (1 − ε)100% highest posterior density (HPD) interval for Υ j , j = 1, 2, can be obtained from the simultaneous solutions of the following equations

Generating posterior samples
Due to the intractable integral forms of the joint posterior and marginal posterior PDFs, we use Markov Chain Monte Carlo (MCMC) algorithm with the Gibbs sampling (GS) strategy (as a special case of the Metropolis-Hastings algorithm) to generate posterior samples. The GS strategy makes use the full set of conditional distributions and Markov chain process to generate posterior samples. For a general model with p dimensional parameter vector, the GS algorithm is presented as follows: Let f (x|υ) be a general PDF that is labeled with parameter vector υ = (υ 1 , υ 2 , ..., υ p ). Based on a given sample x and initial parameter vector υ 0 = (υ 1 , υ 2 , ..., υ p ), the Gibbs sampler gives the values for each iteration with p steps by extracting a new value for each parameter from its full conditional PDF. In symbols, the steps for each iteration (iteration l), are as follows: • Set an initial parameter vector (υ • Extract υ l 1 from π υ 1 |υ l−1 2 , υ l−1 3 , ..., υ l−1 p , x • Extract υ l 2 from π υ 2 |υ l 1 , υ l−1 3 , ..., υ l−1 p , x ; and so on down to • Extract υ l p from π υ p |υ l 1 , υ l 2 , ..., υ l p−1 , x .
Making use the above GS algorithm, the posterior samples of the parameters α and θ of PML distribution are generated from the full conditional posterior PDFs respectively.
There are some special software tools for the implementation of Bayesian analysis. Here, we use the specialized OpenBUGS software to extract posterior samples from the above complicated and unexplicit full conditional PDFs. This software and its earlier version, WinBUGS, have been designed based on the Gibbs sampling algorithm to generate posterior samples from complicated posterior distributions. Here, we use the idea of Congdon (2001) to set the initial values for the hyper-parameters and their values are conducted as θ 0 = θ 1 = α 0 = α 1 = 0.0001.

Monte Carlo simulation
This section presents Monte Carlo simulation results to assess the performance of MLE mentioned in the previous section. First, we generate different samples with size n from (1) based upon the inversion method which implies to find the root of (1 − u)(1 + θ)e θx α = 1 + θ + θx α e −θx α , where u ∼ U (0, 1). We compute the mean square errors (MSEs) and biases of the MLEs of the parameters based on N = 10, 000 iterations. The results are summed up in Table 2 for some selected parameter values and several sample sizes, n. The results in Table 2 indicate that the MSEs and biases of the MLEs decrease when the sample size n increases. So, the MLEs of the parameters are consistent.
It is evident from Table 3 that with increasing sample size n, the posterior risk decreases and this confirms the consistency property. We also observe that as n increases, Bayes estimate of α based on KL loss function provide superior performance than other Bayes estimates whereas Bayes estimate of θ based on PL loss function perform better than other loss functions as θ decreases.

The PML regression model
Here, we present a new survival regression based on PML distribution. Assume that the random variable X follows PML distribution, given in (2). We obtain the log-PML (LPML) distribution by applying Y = log(X) transformation and considering the re-parametrization, α = 1 σ and θ = e − µ σ . The PDF of the transformed variable Y is h (y; σ, µ) = 1 where µ ∈ and σ > 0 are location and scale parameters, respectively. We refer to equation (16) as the LPML distribution, say Y ∼ LPML(α, σ, µ). The survival function of (16) is Let Z = (Y − µ)/σ, then the PDF of the standardized random variable is The LPML regression is defined by and v i is the covariate variable vector and z i is the random error with density h (z; σ, µ). The regression parameters are represented by τ = (τ 1 , . . . , τ p ) .
Let y i be a dependent variable defined as y i = min{log(x i ), log(d i )}, where log(x i ) follows (16) and log(d i ) represent the log-lifetime and log-censoring times. Let M 0 and M 1 be the sets of individuals with log-lifetime and log-censoring, respectively. Under these definitions, the log-likelihood function for ψ = (α, σ, τ ) is where u i = e z i , z i = (y i − v i τ )/σ and r denotes the number of uncensored events. The MLE of ψ can be obtained by maximizing (19) using the optim function of R software.

Residual analysis
To decide the accuracy of fitted regression model, we analyze the departure from error distribution by means of residual analysis. Here, two residuals are used. These are martingale and modified deviance residuals. Residual analysis is an important step of any regression analysis to check the adequacy of fitted model. The martingale residuals of the LPML regression model are where u i = e y i −v i τ σ . Using the martingale residuals, the modified deviance residuals of the LPML regression are where r M i is the martingale residual. The modified deviance residuals are more acceptable and usable than martingale residuals. The reason is that the modified deviance residuals are normally distributed with zero mean and unit variance once the fitted regression model is suitable and accurate for the data used.

Application
In order to show the flexibility and potential of the proposed model, we consider two real data sets based on univariate and covariate observations. In order to achieve this goal. First we consider the exact times of failure of fatigue fracture of Kevlar 373/epoxy data set and the parameter estimations are done by means of three methods (maximum likelihood, Bayesian and bootstrap) which are discussed in Section 4. Second we show the performance of survival regression model of PML distribution via maximum likelihood method by analyzing heart transplant data set which is associated with covariate variables.

Univariate data modeling
We consider exact times of failure of fatigue fracture of Kevlar 373/epoxy which were formerly studied by Andrews and Herzberg (2012).The current data are given as below: Next, we plot the total time of test (TTT) for these data set; for more details about TTT plot, see Abd-Elfattah (1987). This is the primary step in graphical inspection to prove whether this data can be applied to the PML distribution or not. The corresponding TTT plot for this data is displayed in Figure 1 and indicates that the empirical hazard function of the data is increasing. Hence, the PML density is adequate to model these data set.

Bootstrap inference for PML parameters
We obtain the point and 95% confidence interval (CI) estimation for the two parameters of the PML distribution via parametric and non-parametric bootstrap methods. We provide bootstrap estimation results in Table 4 for the failure of fatigue fracture data set. We plot the joint distribution of the bootstrap values in a scatter plot to see the potential correlation structure between the parameters, . The corresponding plots of the bootstrap estimations are shown in Figure 3.   Table 5. The densities of Lindley, Weibull, gamma, generalized exponential (GE) and power Lindley (PL) distributions are: x > 0, and f (x; θ, α) = α θ 2 θ + 1 (1 + x α )x α−1 e −θx α ; x > 0.
From Table 5, we observe that the PML distribution gives the best fit for the current data set as it shows the highest p-value as well as lowest AIC and BIC values than the other models. The histogram, fitted PML, Weibull, gamma and generalized exponential (GE) PDFs for the current data set are shown in Figure 3. Both empirical and fitted CDFs functions, P-P and Q-Q plots for the competitor models, namely, PML, Weibull, gamma and generalized exponential (GE) distributions are displayed in Figure 4, respectively. These plots prove the reported results in Table 5.
Next, we provide graphical and numerical results for the Bayesian analysis of the current data set. Making use of the MCMC procedure with 10, 000 replicates, we obtain Bayesian estimation of the parameters of the PML distribution based on considered loss functions in Table 1. Bayes point estimates and their corresponding posterior risk are provided in Table  6. The credible and HPD intervals at nominal level 95% are reported in Table 7. In order to examine the performance of the Gibbs sampling process and its convergence, we provide the associated plots of posterior samples in Figures 6, 7 and 8. It is evident that these plots also confirm the good performance of Gibbs sampling process in generating posterior samples.    In order to evaluate the performance the MCMC procedure and Gibbs sampling convergence in generating posterior samples, we compute some diagnostics measures such as Gelman-Rubin, Geweke and Raftery-Lewis (see Lee, Kim, and Lee (2014)) in Table 8. The Gelman-Rubin diagnostic for both parameters α and θ is equal to 1. The Geweke's test statistics for these parameters are −0.181 and −0.934, respectively. Moreover, the reported diagnostic statistics for parameters α and θ based on the Raftery-Lewis method doesn't show significant correlations between estimates. The corresponding plots for diagnostics measures Gelman-Rubin and Geweke's test statistics are displayed in Figures 9 and 10, respectively. These plots show that the chain is acceptable and the estimated values have good mixing.  In this application, we assess the performance of LPML regression model using heart transplant data set which is available in survival package of R software. Next, we present results by fitting the regression model where y i is distributed as LPML distribution and the covariate variables x i1 , x i2 and x i3 are associated to the heart transplant data and described as: • x i1 = age at acceptance; • x i2 = previous surgery (surgery coded as 1 = yes; 0 = no); • x i3 = transplant (coded as 1 = yes; 0 = no).

Parameter estimation
Here, we fit the LPML regression model to the current data by MLE method and compare the results with log-Weibull and log-exponential Pareto distributions. The estimated parameters, standard errors (SEs), confidence intervals (CIs) as well as corresponding p-values are listed in Table 9. From this results, we conclude that the estimated regression parameters are statistically significant at 5% level.

Results of residual analysis
The suitability of the fitted LPML regression model is evaluated by residual analysis. The plot of the modified deviance residuals and its quantile-quantile (Q-Q) plot are displayed in 11 which reveal that the fitted LPML regression model provides good fit to the data used.

Concluding remarks
In this work, we introduce an extended form of ML distribution called the power modified Lindley (PML) distribution for which the hazard rate function accommodates the four types of forms, i.e. constant, increasing, decreasing and unimodal. We obtain some of its mathematical/statistical properties including moments, moment generating function, conditional moment, mean deviation, mean residual life, Bonferroni and Lorenz curves, entropy, stressstregth reliability and order statistics. Model parameters are obtained by using maximum likelihood method and Bayesian approach. In Bayesian approach, we consider independent gamma priors for the model parameters and five different loss functions (symmetric and asymmetric loss functions). We also obtained parametric and non-parametric bootstrap confidence intervals using frequentist approach and compared with Bayes credible intervals. Further, we propose the new log power modified Lindley regression model and model parameters are estimated using maximum likelihood method. One important aspect of this new model is that it provides better fits than some well-known models using two real data sets. Simulation results show that Bayes estimate of α based on KLF performs better than their counterparts, while Bayes estimate of θ based on PLF produces better estimate as compared to other loss functions. Real data analysis shows the similar trend of results.