Bivariate Poisson Generalized Lindley Distributions and the Associated BINAR(1) Processes

This paper proposes new bivariate distributions based on the Poisson generalized Lind-ley distribution as marginal. These models include the basic bivariate Poisson generalized Lindley (BPGL) and the Sarmanov-based bivariate Poisson generalized Lindley (SPGL) distributions. Subsequently, we introduce the BPGL and SPGL distributions as joint innovation distributions in a novel bivariate ﬁrst-order integer-valued autoregressive process (BINAR(1)) based on binomial thinning. The model parameters in the BPGL and SPGL distributions are estimated using the method of maximum likelihood (ML) while we apply the conditional maximum likelihood (CML) for the BINAR(1) process. We conduct some simulation experiments to assess the small and large sample performances. Further, we implement the new BINAR(1)s to the Pittsburgh crime series data and they show better ﬁtting criteria than other competing BINAR(1) models in the literature.


Introduction
Modelling the probabilistic behaviour of two random variables simultaneously demanded the construction of bivariate random variables.Bivariate distributions may be thus considered extensions of univariate distributions.Many approaches have been discussed in the literature for constructing bivariate random variables.Balakrishnan and Lai (2009) mentions many of those.Constructing discrete and continuous bivariate rv using the mixture methodology is discussed frequently in the statistical literature (see Karlis and Xekalaki (2005); Lai (2006); Sarabia Alegría and Gómez Déniz (2008) etc).The main advantage of this method is that it will have simple expressions for its marginal probability density function (pdf) and hence its moments, correlation, etc.Another method is to use a new class of distribution for the construction.The Sarmanov family (Sarmanov (1966)) of distributions can be used to construct discrete and continuous bivariate distributions with a flexible covariance structure.Ting Lee (1996) studied various properties regarding the family and constructed bivariate distributions using various marginal distributions and construction methods.Also, the Farlie-Gumbel-Morgenstern (FGM) copula is a special case of the Sarmanov family.The first-order integervalued autoregressive (INAR(1)) model is suitable for time series count datasets that exhibit over-dispersion.In many statistical and applied fields, such as medical insurance, sports, and finance, we can see those kinds of datasets.Since the initial works of McKenzie (1985) and Al-Osh and Alzaid (1987) on INAR(1) process with Poisson innovations, a significant number of works has been introduced in literature having univariate innovation distributions (see, Lívio, Khan, Bourguignon, and Bakouch (2018); Altun (2020a); Eliwa, Altun, El-Dawoody, and El-Morshedy (2020); Altun and Khan (2022); Irshad, Chesneau, D'cruz, and Maya (2021) etc).Irshad, D'cruz, Maya, and Khan (2023) proposed a discrete univariate distribution, the twoparameter Poisson generalized Lindley (TPPGL) distribution, by mixing the Poisson distribution with the new generalized Lindley (NGL) distribution by Abouammoh, Alshangiti, and Ragab (2015).They introduced a count regression model and an INAR(1) model based on the TPPGL distribution (INAR(1)TPPGL) for modelling over-dispersed datasets.Also, it is proved that INAR(1)TPPGL provides a better model than many other recently developed INAR(1) models based on some model selection criterion which immensely motivates us to propose a BINAR(1) process with bivariate TPPGL innovations.Hence in this paper, discrete bivariate distributions based on the TPPGL distribution are constructed by using the mixture methodology (the basic bivariate Poisson generalized Lindley (BPGL)) as well as using the Sarmanov family of distributions (the Sarmanov bivariate Poisson generalized Lindley (SPGL)).Most importantly, these established distributions are mounted as innovations for BINAR(1)(that is., BINAR(1)BPGL and BINAR(1)SPGL), and both processes are compared with some other recently proposed BINAR(1) models as well as discussed in Pedeli and Karlis (2011).We first define the development of the TPPGL distribution and associated INAR(1) process.Then bivariate versions are constructed and adapted to BINAR(1) with bivariate TPPGL innovations (BPGL and SPGL) by inducing a cross-correlation between the counting series by assuming that the paired TPPGL innovations are jointly distributed.
The remaining parts of the paper is organized as follows: Section 2 reviews the TPPGL distribution and the associated INAR(1) process.The BPGL distribution, its corresponding BINAR(1) process, related properties, estimation of parameters, and its simulation study is given in Section 3. Similarly, Section 4 derives the SPGL distribution with its associated BINAR(1) process, corresponding properties, estimation of parameters, and simulation study.The practical importance of the proposed BINAR(1)s are studied in Section 5.

Development of the TPPGL distribution and the associated INAR(1) process
The NGL distribution with parameters α and θ introduced by Abouammoh et al. (2015) has a pdf given by Recently, Irshad et al. (2023) introduced a mixture distribution by mixing the Poisson distribution with the NGL distribution, that is, if X denote the random variable having the TPPGL distribution such that X|λ ∼ P (λ) and λ|α, θ ∼ N GL(α, θ), where λ > 0, θ ≥ 0, α ≥ 1.Then the unconditional probability mass function (pmf) of X having the TPPGL distribution is Thus the probability generating function (pgf), moment generating function (mgf), and hence the mean and variance of X are obtained such that and Also, the dispersion index (DI) of X is obtained as which indicates a clear case of over-dispersion (since DI>1).Therefore, the TPPGL distribution is used as an innovation distribution for the INAR(1) process as the INAR(1)TPPGL process and it is proved that INAR(1)TPPGL gives better results for AIC, empirical mean and variance, than many other recently proposed INAR(1) processes based on real count datasets.The process {X t } t∈Z which follows where ε t is an independent and identically distributed sequence of random variables having the TPPGL distribution is said to be the INAR(1)TPPGL process.Also, ε t is independent of X t−k for all k ≥ 1 and counting series in the binomial thinning operator •.Now the one-step transition probability of the INAR(1)TPPGL process is , k, l ≥ 0. (2.1) The mean, variance, DI, conditional mean and conditional variance are given by Here also, DI greater than 1 implying over-dispersion of the process.

The BPGL distribution and the corresponding BINAR(1) process
Following Gomez-Deniz, Sarabia, and Balakrishnan (2012), the BPGL distribution based on the TPPGL distribution is developed in this section.
Proof.By the procedure defined in Gomez-Deniz et al. (2012), the pmf of the BPGL distribution is Hence it is proven.
c) the joint pgf of X is Following proposition 2, we can find the moment-related properties.
The mean of X i , and the covariance of which will be always positive.

Estimation of parameters of the BPGL distribution
Here the method of ML is used to estimate the unknown parameters.Suppose (X 1i , X 2i ), i = 1, 2, ..., n be the observations of a random sample from BPGL(θ, α, φ 1 , φ 2 ).Then the log of the likelihood function is where β = (θ, α, φ 1 , φ 2 ).The ML estimators of β are obtained by maximizing (3.2) using numerical methods with the help of statistical software.The asymptotic normality and consistency properties of the ML estimators follow from the well-known Lindberg-Levy central limit theorem.Therefore, the ML estimators of β, β is such that, ( β − β) ∼ N(0,I −1 ), where I −1 is the inverse of the Fisher information matrix.Here we have used the quasi-Newton approach, the BFGS algorithm available in the optim function of the R programming language, to find the ML estimators for the β.
The plotted figures show that for the MLEs, their bias and MSEs corresponding to each parameter decrease as the sample size increases.
Now suppose the innovation vector t = ( t,1 , t,2 ) have the BPGL(θ, α, φ 1 , φ 2 ) and t,k is independent from Y s;j , j = 1, 2 for each t and s, s < t.Also, innovations are independent of the counting series in the binomial thinning operator •.Then the resulting Y t = (Y t,1 , Y t,2 ); t = 1, 2, ... will have the BINAR(1) process with the BPGL innovation denoted by BINAR(1)BPGL and the below-mentioned proposition defines some of its properties.
b) the conditional mean and the conditional variance of components of the process for and d) the conditional joint pmf of the process where u = min(y t,1 ; y t−1,1 ), v = min(y t,2 ; y t−1,2 ), Also, the stationary condition for the model is that 0 < p i < 1 (Khan, Oncel Cekim, and Ozel 2020) and E(Y t,i ), Var(Y t,i ) for i = 1, 2 and Cov(Y t,1 , Y t,2 ) does not depend on t and Var(Y t,i ) is finite under the conditions mentioned.

Estimation of parameters of the BINAR(1)BPGL process
Suppose {Y t,i , t = 1, 2, ..., n, i = 1, 2} is a random sample of size n taken from the BI-NAR(1)BPGL process.Here the CML approach is used to estimate the parametric vector Θ = (θ, α, φ 1 , φ 2 , p 1 , p 2 ) of the BINAR(1)BPGL process.The conditional log-likelihood function of the BINAR(1)BPGL process is where Θ = (θ, α, φ 1 , φ 2 , p 1 , p 2 ) is the unknown parametric vector to be estimated and P (y t | y t−1 ) is given by (3.3).The CML estimators are obtained by maximizing (3.4).Furthermore, the consistency and asymptotic normality of the CML estimators under standard regularity conditions are demonstrated in Andersen (1970) and Bu and McCabe (2008).That is, the CML estimator of Θ, Θ is such that, ( Θ − Θ) ∼ N(0, I −1 ), where I −1 is the inverse of the Fisher information matrix (Freeland and McCabe (2004)).Here we used the quasi-Newton approach, the BFGS algorithm available in the optim function of the R programming language to find the CML estimators for the parameters in the BINAR(1)BPGL process.

Simulation study for the BINAR(1)BPGL process
The CML estimates (CMLEs) obtained for the unknown parameters of the BINAR(1)BPGL process is assessed through a simulation study.Hence N =1000 samples each of sizes n=25, 50, 100 are taken for two sets of parametric values (θ = 0.2, α = 1.1, φ 1 = 1, φ 2 = 2, p 1 = 0.5, p 2 = 0.2) and (θ = 0.5, α = 1.3, φ 1 = 1.7, φ 2 = 2.3, p 1 = 0.6, p 2 = 0.3).For each n, CMLEs, its bias and MSEs were calculated and reported through boxplots.The simulation results are illustrated in Figures 7, 8, 9, 10, 11 and 12.That is, for the set of parameter values, Sarmanov (Sarmanov 1966) introduced the Sarmanov family of distributions.Suppose X 1 and X 2 be two random variables each with pmf P (X 1 = x 1 ) and P (X 2 = x 2 ) and with supports defined on A 1 ⊆ R and A 2 ⊆ R respectively.Now let q i (x i ), i = 1, 2 are bounded non-constant functions such as Then, the joint pmf for the Sarmanov family, where the factor ωq 1 (x 1 )q 2 (x 2 ) is a measure of departure of two variables X 1 , X 2 from independence and ω is a real number that satisfies the condition Depending on the choices for the functions q i (x) , i = 1, 2, we can derive different cases.Here we use q i (x i ) = e −x i − L i (1) x i = 0, 1, ... as mentioned in Ting Lee (1996), where L i (1) is the value of the Laplace transform of the marginal distribution evaluated at s = 1, that is where P (.) is the marginal distribution.Hence, we derive the SPGL distribution having TPPGL as marginals.The Laplace transform for the TPPGL distribution, Then the joint pmf for the Sarmanov bivariate distribution takes the form, where L i (.) is the Laplace transform for the i th marginal, i = 1, 2.

Estimation of parameters of the SPGL distribution
The ML method is used for the estimation of unknown parameters.Let (X 1i , X 2i ), i = 1, 2, ..., n be the observations of a random sample from SPGL(θ 1 , θ 2 , α, ω).Then the loglikelihood function, where λ = (θ 1 , θ 2 , α, ω) and P (X 1 = x 1i , X 2 = x 2i ) is the pmf of SPGL distribution defined in (4.2). (4.3) have to be maximized to find the estimators for λ.The asymptotic normality and consistency properties of the ML estimators follow from the well-known Lindberg-Levy central limit theorem.Consequently, the ML estimators of λ, λ is such that, ( λ − λ) ∼ N(0,I −1 ), where I −1 is the inverse of the Fisher information matrix.Here we have used the quasi-Newton approach, the BFGS algorithm available in the optim function of the R programming language to find the ML estimators for the unknown parameters.
Proposition 5. Suppose the bivariate random vector Y t = (Y t,1 , Y t,2 ); t = 1, 2, ... follows the BINAR(1)SPGL, then a) for i=1,2, the mean, variance and DI of Y t,i , b) the conditional mean and the conditional variance of components of the process for i=1,2, and c) the covariance of t,1 and t,2 , d) the covariance of two components of the BINAR(1)SPGL process, is obtained as e) the conditional joint pmf of the BINAR(1)SPGL process can be obtained by using (3.3) except that P ( t,1 = 1 , t,2 = 2 ) is swaped by (4.2).
Also, the stationary condition for the model is that 0 < p i < 1 and E(Y t,i ), Var(Y t,i ) for i = 1, 2 and Cov(Y t,1 , Y t,2 ) does not depend on t and Var(Y t,i ) is finite under the conditions mentioned.

Estimation of parameters of the BINAR(1)SPGL process
We here use the method of CML to estimate the parameters.Suppose {Y t,i , t = 1, 2, ..., n, i = 1, 2} is a random sample of size n taken from the BINAR(1)SPGL process.Then the conditional log-likelihood function is such that, where Θ = (θ 1 , θ 2 , α, ω, p 1 , p 2 ) is the unknown parametric vector to be estimated and P (y t | y t−1 ) is obtained by substituting the joint pmf of ( t,1 , t,2 ) by (4.2) in (3.3) .The CML estimators are obtained by maximizing (4.4).Also, the consistency and asymptotic normality of the CML estimators under standard regularity conditions are demonstrated in Andersen (1970) and Bu and McCabe (2008).That is, the CML estimator for Θ , Θ is such that, ( Θ − Θ ) ∼ N(0,I −1 ), where I −1 is the inverse of the Fisher information matrix (Freeland and McCabe (2004)).Here we have used the quasi-Newton approach, the BFGS algorithm available in optim function of the R programming language to find the CML estimators for the parameters in the BINAR(1)SPGL process.

Empirical study
In this section we do some testing procedures for the new bivariate distributions and the BINAR(1)s, that is, they are compared with the corresponding models when α = 2 for some real datasets.Also, the empirical importance of the proposed two BINAR(1) processes is proved using a real data set.

Football data
Here we consider the real bivariate count data which is reported in Lee and Cha (2015), the Italian Series A football match score played between two Italian football giants "ACF Fiorentina " and "Juventus" during the period 1996 to 2011.For comparison purposes, we do two tests of hypothesis for both the discussed bivariate distributions against the corresponding situations when α = 2.That is, for both cases, we test the hypothesis For the first case, to test H 0 : BBPL distribution versus H 1 : BPGL distribution, we use the For the second case, to test H 0 : BSPL distribution versus H 1 : SPGL distribution, we use the likelihood ratio test statistic whose value is 10.6496 (p-value = 0.0011).Hence, the model BSPL is rejected in favour of the proposed model SPGL at any level > 0.0011.

Criminal data
Here, the empirical importance of the proposed BINAR(1)s are studied in detail using real bivariate time series count data.For the first case, for testing H 0 : BINAR(1)BBPL process versus H 1 : BINAR(1)BPGL process, we use the likelihood ratio test statistic whose value is 380.6648(p-value = 0.000).Hence, the model BINAR(1)BBPL is rejected in favor of the proposed process BINAR(1)BPGL at any level > 0.000.
The ACF plots for the standardized residuals for both the time series are plotted in Figure

Concluding remarks
Two novel bivariate distributions, namely, the BPGL and SPGL distributions, are introduced in this paper.Their mathematical properties are derived.The respective model parameters are estimated using the ML approach.The simulation results based on these novel bivariate models yield consistent estimates under both distributions.Moreover, the BINAR(1)BPGL and BINAR(1)SPGL processes are constructed with BPGL and SPGL paired innovations.The parameters of the BINAR(1)s are then estimated using the method of CML.The small and large sample performances of the established BINAR (1)s are studied and the simulated mean results confirm the consistency of the estimates.For the models that we have established (that is, BPGL, SPGL, BINAR(1)BPGL and BINAR(1)SPGL), we have used the likelihood ratio test for testing the aptness of these models against the models arising from the situation when α = 2 corresponding to each of the proposed models (that is, BBPL, BSPL, BINAR(1)BBPL and BINAR(1)BSPL).We used real datasets for that purpose and we found that our models are better than the compared models.Further, the BINAR(1)s are applied to analyze the Pittsburg data and the results reveal that the BINAR(1)SPGL process provides better model adequacy measures than some other recently proposed BINAR(1) models.Hence, the BINAR(1)s with the TPPGL distribution as innovations can be considered among the commendable bivariate time series models and compete against the existing ones in the literature.Their applications are also subject to the nature of the data.

Figure 28 :
Figure 28: ACF of the standardized residuals of CDRUGS and CSHOTS data for the BI-NAR(1)SPGL process

Table 1 :
Estimates and model adequacy measures of the BINAR(1) models