A New Family of Distributions Based on the Generalized Pearson Differential Equation with Some Applications

Recently, a generalization of the Pearson differential equation has appeared in the literature, from which a vast majority of continuous probability density functions (pdf’s) can be generated, known as the generalized Pearson system of continuous probability distributions. This paper derives a new family of distributions based on the generalized Pearson differential equation, which is a natural generalization of the generalized inverse Gaussian distribution. Some characteristics of the new distribution are obtained. Plots for the cumulative distribution function, pdf and hazard function, tables with percentiles and with values of skewness and kurtosis are provided. It is observed that the new distribution is skewed to the right and bears most of the properties of skewed distributions. As a motivation, the statistical applications of the results to a problem of forestry have been provided. It is found that our newly proposed model fits better than gamma, log-normal and inverse Gaussian distributions. Since many researchers have studied the use of the generalized inverse Gaussian distributions in the fields of biomedicine, demography, environmental and ecological sciences, finance, lifetime data, reliability theory, traffic data, etc., we hope the findings of the paper will be useful for the practitioners in various fields of theoretical and applied sciences. Zusammenfassung: Neuerdings ist in der Literatur eine Verallgemeinerung der Pearson Differentialgleichung erschienen, von der eine große Anzahl stetiger Dichtefunktionen abgeleitet werden kann, die als generalisiertes Pearson System stetiger Verteilungsfunktionen bekannt sind. In diesem Artikel wird eine neue Familie von Verteilungen hergeleitet, welche auf der generalisierten Pearson Differentialgleichung beruht und die eine natürliche Verallgemeinerung der generalisierten inversen Gauss-Verteilung ist. Einige Eigenschaften dieser neuen Verteilung werden hergeleitet. Plots der Verteilungsfunktion, Dichtefunktion und der Hazardfunktion, Tabellen mit den Perzentilen und mit Werten der Schiefe und der Kurtosis werden angeboten. Es fällt auf, dass die neue Verteilung rechtsschief ist und die meisten Eigenschaften von schiefen Verteilungen aufweist. Als Motivation werden die Resultate auf ein Problem in der Forstwirtschaft angewandt. Dabei passt unser neues Modell besser als eine Gamma-, log-normal und inverse GaussVerteilung. Da viele Forscher den Gebrauch von generalisierten inversen Gauss-Verteilungen in der Biomedizin, Demographie, Umweltwissenschaft, Finanzwesen, bei Lebensdauerund Verkehrsuntersuchungen, etc., untersucht 260 Austrian Journal of Statistics, Vol. 39 (2010), No. 3, 259–278 haben, hoffen wir, dass unsere Ergebnisse für Praktiker in den verschiedenen Fächern der theoretischen und angewandten Wissenschaften von Nutzen sein werden.


Introduction
Recently, a generalization of the Pearson differential equation has appeared in the literature: where m, n ≥ 1 are arbitrary integers, and the coefficients a and b are real numbers.By proper choice of the parameters a and b, a vast majority of continuous probability density functions (pdf's) can be generated from equation (1), known as the generalized Pearson system of continuous probability distributions.Note that the classical differential equation introduced by Karl Pearson during the late 19th century is a special case of (1).For details on the Pearson system of continuous probability distributions, the interested readers are referred to Elderton (1953), Stuart and Ord (1994), and Johnson, Kotz, and Balakrishnan (1994), among others.The well-known families of continuous probability distributions such as the normal and the Student t distributions (known as Pearson Type VII), beta distribution (known as Pearson Type I), and gamma distribution (known as Pearson Type III), introduced by Karl Pearson during the late 19th century can be generated as a solution to (1) by a proper choice of the parameters.For example, the normal distribution belongs to the generalized Pearson system of continuous probability distributions when m = 1, n = 2, a 0 = −µ, a 1 = 1, b 0 = −σ 2 , b 1 = 0, b 2 = 0.It appears from the literature that not much attention has been paid to the study of the family of continuous pdf's that can be generated as a solution to the generalized Pearson differential equation (1), except Dunning and Hanson (1977), Chaudhry and Ahmad (1993), and recently Shakil, Singh, and Kibria (2010).In Dunning and Hanson (1977), a generalization of the Pearson curves has been obtained as solution of (1) which best fits a histogram in the mean square sense and satisfies certain statistical constraints.Chaudhry and Ahmad (1993) have introduced a distribution with the following pdf as a solution to equation (1) This paper derives a new family of continuous pdf's as a solution to (1), which includes the p-th root reciprocal of the IG distribution, that is, the distribution of the random variable X = 1/ p √ Y , where Y has an IG distribution.It will be seen that this new distribution is more flexible and is a natural generalization of the IG and the generalized inverse Gaussian (GIG) distributions.It has also been observed that a number of other distributions including those of Chaudhry and Ahmad (1993) and Chou and Huang (2004) are special cases of this distribution.For some discussions on IG and GIG, the interested readers are referred to Jørgensen (1982), Johnson et al. (1994), and Chou and Huang (2004), among others.In what follows, some characteristics of our newly proposed distribution, including the expressions for the normalizing constant, pdf, cumulative distribution function (cdf), k-th moment, Shannon's entropy and relationships to other probability distributions, are derived.The plots for the cdf and pdf of the new distribution, including the percentile points, for some selected values of parameters, have been provided.The infinite divisibility property of the newly proposed distribution family is discussed.The distributional relationships to some distributions are established.The plots for the cdf, pdf and hazard function, percentile points and tables for Pearson's measures of skewness and kurtosis for selected coefficients and parameters have been provided.The estimation of parameters by maximum likelihood estimation and method of moments are discussed.
It is observed that the new distribution is skewed to the right and bears most of the properties of skewed distributions.Since many researchers have studied the uses of the IG and the GIG distributions in the fields of biomedicine, demography, environmental and ecological sciences, lifetime data, reliability theory, traffic data, etc., we hope the findings of the paper will be useful for the practitioners in various fields of theoretical and applied sciences.
The organization of this paper is as follows.In Section 2, the pdf and cdf of the proposed distribution have been provided.Section 3 discusses some characteristics of the new distribution.Some distributional relationships are presented in Section 4. The percentage points of the new distribution are given in Section 5.The statistical applications of the results are contained in Section 6.Some concluding remarks are provided in Section 7. The derivations of the cdf, pdf, k-th moment, etc, in this paper involve some special functions, which are provided in the Appendix.

Derivation of the New Probability Distribution
In this section, the new continuous pdf is derived as a solution to the generalized Pearson differential equation (1).The expression for the cdf of the new distribution is obtained.Some graphical representations of the pdf and cdf of the new distribution for some selected values of the parameters are provided.

Expressions for the Normalizing Constant and for the PDF
We consider the generalized Pearson differential equation (1) in the following form The solution to the differential equation ( 3) is given by where α = −(a 2p )/pb p+1 , β = a 0 /pb p+1 , ν = (a p +b p+1 )/b p+1 , b p+1 = 0, p > 0, and C is the normalizing constant.According to the parameters {α, β, ν, p}, our newly proposed family of generalized GIG (GGIG) distributions may be classified into the following three classes, for which, using Lemma 1 and definition of gamma function (see the Appendix), the respective normalizing constants are also easily evaluated as given below: 1. Class I: α > 0, β > 0, ν ∈ R, and p > 0; where K ν/p (2 √ αβ) denotes the modified Bessel function of third kind (see the Appendix).
It is also worth noting that our family of GGIG distributions is closed under the power transformation.That is, if where s > 0. One can use this property in information analysis, for which the interested readers are referred to Dadpay, Soofi, and Soyer (2007), where similar properties have been considered in the context of the generalized gamma distribution family.This paper considers Class I as it is more general than the other two classes.Thus, from (4) and ( 5), for a random variable X, the following continuous pdf in terms of the modified Bessel function of the third kind is generated from the generalized Pearson differential equation (3): where x > 0, α > 0, β > 0, ν ∈ R, and p > 0. We refer to this as the GGIG distribution family.Note that ν and p are shape parameters, and α and β denote scale parameters.
Using the definition of the Whittaker function (see appendix A), equation ( 6) is easily expressed as Remark 1: In view of the facts that GIG is used in the area of finance (as mixing distribution) in the context of the generalized hyperbolic distribution family (see e.g.Prause, 1997Prause, , 1999)), and as our proposed distribution family is a generalization of the GIG distribution (which we refer to as the GGIG distribution family), it is hoped that one can use the GGIG in the area of finance and other fields of statistical research.
Remark 2: (Infinite Divisibility of the GGIG Distribution Family) Note that when p = 1 and ν ∈ R, equation ( 6) reduces to the pdf of the GGIG distribution.The infinite divisibility of the GIG distribution has been determined by Barndorff-Nielson and Halgreen (1977), for which the interested readers are also referred to Marshall and Olkin (2007, p. 466) or Theorem 5.22 of Steutel and Harn (2004, p. 361).On the other hand, if p > 1, then by Theorem 9.1 of Steutel and Harn (2004, p. 115) it follows that f X (x) is not infinitely divisible.

Derivation of the CDF
Suppose X is a random variable with the pdf f X (x) as given in (6).Then, using the definitions of exponential and incomplete gamma functions, the cdf of the random variable X can easily be expressed as where α > 0, β > 0, ν ∈ R, p > 0. By direct differentiation of the cdf in (7) and noting that ∂γ(a, t)/∂t = t a−1 e −t , it can be easily verified that dF X (x)/dx = f X (x), where f X (x) denotes the pdf of the random variable X as given in (6).Using the series expansion of exp(−bt −1 ) in the definition of the generalized incomplete gamma function, the following result easily follows: where γ(η − k, z) denotes the ordinary incomplete gamma function.Thus, using (8) in equation ( 7), the following expression for the cdf is obtained: Using the definition of Whittaker function, the cdf ( 9) is easily expressed as As a special case of equation ( 6) for ν = 1, p = 2, by substituting t = z/α in the following integral, and applying the Lemma 2 (see the Appendix), the cdf of the random variable X is easily expressed in terms of the generalized incomplete gamma and error functions as follows: where α > 0, β > 0. Further, applying the equations (2.130) and (2.131) of Chaudhry and Zubair (2002, p. 53), the cdf of X, given in ( 12), can easily be expressed in terms of the incomplete gamma and confluent hypergeometric functions respectively as follows: and Noting that erfc(z/ √ 2) = 1 − 2Φ(z), where Φ(z) denotes the cdf of the standard normal distribution, the cdf of the random variable X, given by the equations ( 13), is easily be expressed in terms of Φ as where x > 0, α > 0, β > 0. Applying theorem 2.8 of Chaudhry and Zubair (2002, p. 57) and equation 28 of Erdélyi, Magnus, Oberhettinger, and Tricomi (1953, p. 226), the cdf of X, given by the equations ( 11) or ( 12), is expressed as where Γ 2 (•) denotes Horn's hypergeometric series of two variables.

Plots of the PDF and CDF of the Random Variable X
The possible shapes of the pdf (6) and cdf (9) of X are provided for some selected values of the parameters in Figures 1 to 4, respectively.The effects of the parameters can easily be seen from these graphs.For example, it is clear from the plotted Figures 1 to 2, for selected values of the parameters, the distributions of X are positively (that is, right) skewed with longer and heavier right tails.    3 Properties of the New Distribution

Mode
The mode is the value of x for which the pdf f X (x) is maximal.Now, differentiating (6), we have which, when equated to 0, gives the mode of the newly proposed pdf to be which is obtained by solving the quadratic equation αp(x p ) 2 − (ν − 1)x p − βp = 0, and ignoring the second root since, by our assumption, x > 0. Differentiating ( 14), we get By simple arguments, it can easily be seen that Thus, the maximum value of the pdf ( 6) is given by f X (x m ).Clearly, the newly proposed pdf defined by ( 6) is unimodal.

k-th Moment (About the Origin)
Suppose X is a random variable with pdf f X (x) as given in (6).Then, using the Lemma 1, the k-th moment, α k , of X, for some integer k > 0, is easily expressed in terms of Macdonald function as Using the definition of the Whittaker function, α k is expressed as Note: From equations ( 15) and ( 16), one can easily obtain the first, second and higher moments.The moments of different orders when p = 2 and ν = 1 in (15) can be found in Chaudhry and Ahmad (1993).

k-th Central Moment
It is easy to see that the k-th central moment, β k , of X is where E j (X) and E(X k−j ) are obtained by the equations ( 15) and ( 16).From ( 17), one can obtain the second, third, and higher central moments.
• The Pearson's measure of skewness γ 1 and the kurtosis γ 2 are given by where the variance β 2 and the third and fourth central moments β 3 and β 4 are obtained from ( 17) by taking k = 2, k = 3, and k = 4, respectively.
Using a Maple 11 program, numerical values of the skewness γ 1 and the kurtosis γ 2 for some selected values of the parameters are provided in the Tables 1 to 3. It is evident from these computations that the skewness is positive which implies that distribution of X is positively skewed.Moreover, it is observed from these tables that for all selected values of the parameters the kurtosis γ 2 > 3 implying that the distribution is heavier tailed, except for the parameters α = 1, β = 1, ν = 2, p = 4, for which γ 2 = 2.908 < 3, implying that the distribution is lighter tailed.

Characteristic Function and r-th Cumulant
It is easy to see that the characteristic function of X is given by where i = √ −1 is the imaginary number (i 2 = −1) and E(X k ) denotes the k-th moment about the origin of X, which can be obtained from ( 15) and ( 16).The r-th cumulant κ r of X having the characteristic function ( 18) is given by from which, by successive differentiation, it can be seen that

Entropy
An entropy provides an excellent tool to quantify the amount of information (or uncertainty) contained in a random observation regarding its parent distribution (population).A large value of entropy implies the greater uncertainty in the data.As proposed by Shannon (1948), entropy of an absolutely continuous random variable X having pdf φ X (x) is defined as where S = {x : φ X (x) > 0}.Thus, in view of ( 19), the entropy of X having pdf (6) with normalizing constant C given by ( 5) is expressed as Special Case: Suppose X is a random variable with pdf f X (x) as given in (6), where . Then, applying the Equation (4.356.1),Page 577, in Gradshteyn and Ryzhik (2000), it can be seen that the expression for the entropy as given in (20) reduces to the following interesting form:

Survival and Hazard Functions
The survival and hazard functions of the proposed distribution are respectively given by and where x > 0, α > 0, β > 0, ν ∈ R, and p > 0. The possible shapes of the hazard function ( 23) are provided in Figure 5 for some selected values of the parameters.From these, upon differentiation, the following systems of equations are obtained: where the derivatives K λ (•) and ∂K λ (•)/∂λ of the modified Bessel function K λ (•) of the third kind with respect to the argument and index (or order) λ, respectively, may be computed using the analytical formulas provided in Abramowitz and Stegun (1970) and Gradshteyn and Ryzhik (2000).Note that when p = 1 and ν = −1/2, the maximum likelihood estimates of the parameters α and β can be found in Koutrouvelis, Canavos, and Meintanis (2005).For p = 1 and ν ∈ R, equation ( 6) reduces to the pdf of the GIG distribution.For the maximum likelihood estimates of the parameters α, β and ν of the GIG distribution, the interested readers are referred to Jørgensen (1982).When p = 2 and ν = 1, the maximum likelihood estimates of the parameters α and β can be found in Chaudhry and Ahmad (1993).When p > 2 and ν ∈ R, the maximum likelihood estimates of the parameters α, β, ν, and p are determined by solving equations ( 24) to ( 27) following the iteration methods as developed in Lee (2010) and using Maple 11.

The Method of Moments
The first three moments of the random variable X with pdf f X (x) in ( 6) are given by where ν ∈ R, α > 0, β > 0, p > 0, and k = 1, 2, 3, 4. Since the moment equation ( 28) depends on the modified Bessel function K λ (•) of the third kind, the moment estimation of the parameters α, β, ν, and p are determined by solving the system of equations ( 28) following the Newton-Raphson iteration method and using Maple 11.

Distributional Relationships
It is easy to see that, by a simple transformation of the variable X or by taking special values of the parameters α, β, ν, p in equation ( 6), a number of distributions as given below, including those of Chaudhry and Ahmad (1993) and Chou and Huang (2004), are special cases of our proposed distribution.Thus, our distribution defines a new family

Percentiles
This section computes the percentage points of the distribution with pdf given in (6).For any 0 < q < 1, the 100q-th percentile (also called the quantile of order q) is a number x q such that the area under f X (x) to the left of x q is q.That is, x q is any root of the equation By numerically solving the equation for the cdf in (7), percentage points x q associated with the cdf of X are computed for some selected values of the parameters.These are provided in the Tables 4 to 7.

Applications
To illustrate the performance of our distribution, an example of tree circumferences in Marshall, Minnesota (based on data from Rice, 1999), has been considered in this section.

3. 6
Estimation of the Parameters3.6.1 The Method of Maximum LikelihoodGiven a sample {x i }, i = 1, . . ., n, the likelihood function corresponding to (6) is given by L = n i=1 f (x i ).The of the likelihood function approach is to determine those values of the parameters that maximizeL.Suppose R = log(L) = n i=1 log(f (x i )).Then the maximum likelihood estimates of the parameters α, β, ν, and p are obtained by solving the maximum likelihood equations

Figure 6 :
Figure 6: Fitting of the pdfs of the GGIG, IG, log-normal, and gamma model to the tree measurements data √ x − b/x + exp(2 √ b)erfc √ x + b/x .

Table 2 :
Moments for the parameters α