New Distribution for Fitting Discrete Data: The Poisson-Gold Distribution and it’s Statistical Properties

Motivated mainly by lifetime issues, a new lifetime distribution coined“Discrete PoissonGold distribution” is introduced in this paper. Different structural properties of the new distribution are derived including moment generating function and the r moment and others are presented. In addition, we discussed various important mathematical properties of the new distribution including estimation procedures for estimating the distribution parameters using the maximum likelihood and method of moments. The usefulness and credibility of the distribution are illustrated by means of two real-data applications to show its superior performance over some other well-known lifetime distributions and to prove its versatility in practical applications.


Introduction
Studying and analyzing lifetime data play a crucial role in a wide variety of fields in applied sciences such as engineering, medicine, insurance, economics and marketing. The exponential distribution is widely used for modeling such dataset. Nonetheless, using exponential distribution for some practical applications does not provide a reasonable parametric fit when the underlying failure rates present non-constant (monotone) shapes. To overcome such problem, several recent probability distributions have been developed to model such data by compounding some useful life distributions (Gómez-Déniz, Sordo, and Calderín-Ojeda (2014); Shanker (2015); Al-Omari, Al-Nasser, and Ciavolino (2019); Al-Omari and Shraa (2019); Altun (2019)).
Compounding Poisson distribution with many other continuous distributions plays a prominent role in deriving new families of distributions. Recall that a random variable X is said to have a Poisson distribution (PD) with parameter θ > 0 if its probability mass function is: The resulting distributions are more flexible for modeling lifetime data and, sometimes, provide reasonable parametric fits to practical applications as in lifetime and reliability studies. The flexibility of the new distributions comes from the fact that one or more failure rate shapes can be decreasing, increasing, bathtub shaped or unimodal. Indeed, it is possible to construct the mathematical features of a lifetime distribution for a specific life phenomenon based on its failure rate pattern. Lindley (1958) introduced the Lindley distribution (LD) for analyzing lifetime data that belongs to an exponential family. It has been used as an alternative to the exponential lifetime distributions when the failure rate is unimodal (Bakouch, Al-Zahrani, Al-Shomrani, Marchi, and Louzada 2012). It has also been widely used in the field of medical sciences, biological sciences, even in engineering for the data having increasing hazard rate function. The probability density function (pdf) of the LD with scale parameter θ > 0 is defined by (1 + x)e −θx , x > 0, θ > 0.
The corresponding cumulative distribution function (cdf) is given by The pdf of the LD is a two-component mixture of exponential (θ) and gamma(2, θ). Figure 1 shows the pdf and the cdf of LD for some parameter values. Statistical properties and parameter estimation of the LD were studied and applied to a waiting time data by Ghitany, Atieh, and Nadarajah (2008) and Ghitany, Alqallaf, Al-Mutairi, and Husain (2011). It was shown that LD is better than the exponential distribution in modeling lifetime data using some comparison criteria. Due to the failure rate property of LD, there are some situations where the distribution fails to provide a good fit in modeling real lifetime data. To address this situation, many researchers have proposed new classes of distributions based on generalizations and modifications of the LD. Recently, Al-Talib, Al-Nasser, and Ciavolino (2019) proposed a new continuous distribution named as Gold distribution (GD); which offers a flexible distribution for modeling lifetime data and generalization of many statistical distributions such as exponential, gamma and Lindley distributions.
On the other hand, Sankaran (1970) proposed the discrete Poisson-Lindley distribution by compounding the Poisson and Lindley distributions. Adamidis and Loukas (1998) introduced a two-parameter exponential-geometric distribution with decreasing failure rate by compounding the exponential and geometric distributions. Gupta and Kundu (1999) and Gupta and Kundu (2001) studied and introduced the generalized exponential distribution as an extension of the exponential distribution. Kuş (2007), proposed an exponential-Poisson distribution by compounding an exponential and zero truncated Poisson distributions. Chahkandi and Ganjali (2009) introduced a new two-parameter distribution family with decreasing failure rate by compounding power-series distribution and exponential distribution.
In this paper, a new discrete distribution is introduced by compounding the Poisson and Gold distributions. The main motivations for introducing this lifetime model are to: (i) propose a new, more flexible lifetime distribution that can be used for modeling lifetime data in a wider class of reliability problems; (ii) extend both Poisson and Gold distributions; (iii) accommodate a broad class of failure rate functions.
The rest of the paper is organized as follows: in Section 2, we review the GD. Section 3 describes the new Poisson-Gold distribution (PGD) and study its basic properties, including defining its pmf and cdf. Section 4 introduced some related discrete distribution to the PGD. Section 5 investigates the factorial moments of PGD including deriving its mean and variance. Moment and probability generating functions are also explored in this Section. The hazard rate function is discussed in Section 6. Estimation of the distribution parameters by maximum likelihood and method of moments are investigated in Section 7. In Section 8, two real data applications illustrate the performance of the PGD over other competing reliability distributions and reports the results. The article ends with concluding remarks in Section 9.

The Gold distribution
The GD is a mixture of k independent gamma distributions with a scale parameter (θ) and different shape parameters (j); where j = 1, 2, . . . , k.
Definition 1. A random variable X is said to have a GD if the pdf and cdf of X are given by , x > 0, θ > 0 ; k = 1, 2, ... and x = 1, (1) The mean and variance are found to be: Al-Talib et al. (2019) obtained many interesting properties of the GD.

Poisson-Gold distribution
The GD is a mixture of several Gamma distributions, therefore, it could be considered as a conjugate distribution of Poisson distribution (PD), which is one of the well-utilized discrete distributions. This fact leads to an analytically tractable compound distribution where one think of the λ parameter in PD as being random variable drawn from GD.

Figure 3: The cdf of GD for some parameter values
Definition 2. A random variable X follows a mixed PD with mixing distribution having probability density function g if its probability function is given by This can be rewritten in terms of the probability generating function, M X (t) as Note that the right-hand side of this equation is the moment generating function of the mixing distribution evaluated at t − 1. This, immediately, implies that the probability generating function of the mixed PD uniquely determines the mixing distribution through its moment generating function. In the sequel, the mixed PD with mixing distribution with density function g is denoted by the M P (g), while its probability function is denoted by P (x). Note that λ can be a continuous/discrete random variable or it may be a finite number of values.
Corollary 1. Suppose that the parameter λ of the PD follows GD (1). Then, the Poisson mixture of GD "PGD(k, θ)" is given by Proof. Using the fact given in (2), then Or simply the probability mass function (pmf) of PGD can be written as which completes the proof of Corollary 1.
Figures 4 shows different bar diagrams expressing different shapes of the pmf of PGD for different values of k and θ: In particular, the null probability can be expressed as: Moreover, the cumulative distribution function (cdf) of PG distribution will be

PGD related distributions
The PGD is a generalization of some well-known discrete distributions. When k = 1, the pmf in (3) becomes: which is the geometric distribution with probability of fail q = θ 1+θ . Note that this is exactly the Poisson-Exponential distribution. In fact, based on the GD properties, the PGD can be considered as a family of discrete distributions. The value of the parameter k plays the main role in redefining a new discrete distribution. Table 1 represents several special cases of P GD (θ, k): Poisson-Sujatha distribution Shanker and Fesshaye (2016) P GD( 1 θ , 4) Poisson-Amarendra distribution Shanker (2016) P GD( 1 θ , 5) Poisson-Devya distribution New discrete distribution P GD( 1 θ , 6) Poisson-Shambhu distribution New discrete distribution

Moments and generating functions
Many of the most important measures and characteristics of a distribution can be derived and investigated through its moments, therefore, in this section we consider different moments of P GD (k, θ) distribution such as moment generating function, probability generation function and factorial moments.

Moment generating function
Corollary 2. Suppose that X ∼ P GD (k, θ); then the moment generating function (mgf ) is given as: It is well-known that the mgf of PD is given by As a result, which completes the proof of Corollary 2.

Probability generating function
Corollary 3. Suppose that X ∼ P GD (k, θ); then the probability generating function (PGF) of X is defined as: Proof. The PGF of X is defined as By using Taylor expansion, we have Therefore,

Hence,
which completes the proof of Corollary 3.

Factorial moments of PGD
Corollary 4. The r th factorial moment about the origin of a random variable X, where X ∼ P GD(k, θ), and θ, k > 0; is given by the following formula (X−r)! be the r th factorial of X, then the factorial mean can be obtain as However, it can be shown with simple transformation that ∞ x=r e −λ λ x−r (x−r)! = 1. It follows that Thus, which completes the proof of Corollary 4.
Accordingly, the mean of the PGD is given directly as However, the variance of the PGD can be found based on the second factorial moment: Therefore,

Hazard rate function
The hazard rate is used to monitor the lifetime of a unit since it is more informative about the underlying mechanism of failure than the lifetime distribution. The hazard rate function of the PGD is given by Thus, The above expression is a decreasing function in x implying unimodalilty. Furthermore, Now, since the density function P (X = x) is log-concave, the distribution in (3) has an increasing failure rate, see Johnson, Kemp, and Kotz (2005) (p. 209). For instance, for k = 1 we have P (X = x + 1) P (X = x) = θ 1 + θ ; Therefore, P (X = x + 1) = θ 1 + θ P (X = x) ;

Parameter estimation
In this section, we investigated the estimation of the parameters of the PGD by the maximum likelihood and the method of moment.

Maximum likelihood estimation
A maximum likelihood method is one of the most widely used technique of estimation in statistics. This method finds the parameter value that maximizes the likelihood function. More precisely, if the data are independent and identically distributed, then we have Consequently, Equivalently, to find the value of θ that maximizes the probability function of the PGD, we can use the logarithm of the likelihood function which can be solved by finding the first derivative with respect to θ.
As a special case, when k = 1 by solving the above equation, we find thatθ =X Another special case, when k = 2 Solving such non-linear equation can be done using any numerical iteration methods such as Newton Raphson method.
In general, the maximum likelihood estimate of θ can be found numerically by solving the following non-linear equation:

Method of moments (MoM)
The MoM is one of the most popular techniques and probably the oldest technique in statistics for constructing an estimator. Let µ j (θ) = E θ X j be the theoretical moment andμ j (θ) = x j dF (x) = 1 n n i=1 X j i be the empirical moment. From the law of large numbers, it is well-known that the empirical moments are close to theoretical ones. MoM estimators of θ 1 , . . . , θ k , areθ 1 , . . . ,θ k , which defined as solutions to the following system of equations µ j (θ) =μ j , j = 1, . . . , k, Table 2 shows some parameter estimation results for different values of k.
and B is the solution of the following equation whereX is the sample mean.

Real data application
Despite their small size, thunderstorms are among the most common and dangerous of all weather phenomenon. One of the main concerns of the meteorologists is to forecast afternoon convective thunderstorm activity at the Cape Kennedy Florida. This is because of the importance of thunderstorms in designing launch vehicles and planning of space missions.
PD is a statistical model suitable to be used for the case of independent events and mean equals variance, which is impractical condition in most of data sets such as thunderstorms.
One may think to use the negative binomial distribution (NBD) as an alternative of the PD. Indeed, fitting NBD to the count data require the mean to be less than the variance which is not valid in the thunderstorms data.
Therefore, to clarify the application of the PGD, we consider the fit of the distribution to the count data on thunderstorm events at Cape Kennedy, Florida in two cases; for the month of June and for summer seasons (Jun, July and August) within eleven years period; from 1957 -1967; in Tables 3 and 4 (Falls, Williford, and Carter 1971). The ML estimates have been evaluated by numerical optimization of the log-likelihood function instead of solving (5). The Poisson-Lindley distribution (PLD) and Poisson-Shanker distribution (PSD) introduced by (Shanker, Hagos, and Sujatha (2015), Shanker and Fesshaye (2017)) have also been fitted for comparison.
It is customary to use the chi-square test only when none of the expected frequencies (E i ) is less than 5; sometimes this requires that we combine some of the cells. As a result, we have combined some cells. The calculated value of chi-square is shown in the below Table 3. Note that the number of degrees of freedom is given by df = c − p − 1, where c is the number of terms in the chi-square statistic and p is the number of estimated parameters in the fitted distribution. The expected Poisson frequencies along with the test statistics of this data set are provided in Table 3 and indicate a poor fit to this model (χ 2 = 31.93, df = 2, p − value = 1.16 × 10 −7 ). The other models have a much better fit than that given by the Poisson model and the best one is PGD with k = 4 (χ 2 = 1.09, df = 3, p − value = 0.7795) at the price of 3 df . The empirical and fitted distribution using the different methods are compared in Figure 5. The expected frequencies of PD, PLD and PSD along with the test statistics of this data set are provided in Table 4 and indicate a poor fit to this model. The PGD with k = 3 and k = 4 have a slightly better fit than the other models. The empirical and fitted distribution using the different methods are compared in Figure 6.

Concluding remarks
In this paper, a new discrete distribution has been introduced by compounding the Poisson and Gold distributions (PGD). The properties of the PGD were derived and discussed, the properties include the factorial moments, moment generating function, probability generating functions and hazard rate. Moreover, the unknown parameters are also estimated by using the maximum likelihood estimation method and the method of moments. In addition, the goodness of fit of PGD has been discussed with two applications from thunderstorms and the fit has been compared with PD, Poisson-Lindley distribution (PLD) and Poisson-Shanker distribution (PSD). The data analysis shows that the PGD gives satisfactory fit in the two datasets and slightly better than the other competitive distributions. It appears that the PD is not an appropriate choice to fit thunderstorm data. In fact, the PGD is recommended for fitting such data and may help the meteorologists for proper designing launch vehicles, planning of space missions and in launch operations at different stations in the case of having thunderstorms activities.
Histogram of x