Multivariate Generalized Linear Mixed Models for Count Data

Univariate regression models have rich literature for counting data. However, this is not the case for multivariate count data. Therefore, we present the Multivariate Generalized Linear Mixed Models framework that deals with a multivariate set of responses, measuring the correlation between them through random effects that follows a multivariate normal distribution. This model is based on a GLMM with a random intercept and the estimation process remains the same as a standard GLMM with random effects integrated out via Laplace approximation. We efficiently implemented this model through the TMB package available in R. We used Poisson, negative binomial (NB), and COM-Poisson distributions. To assess the estimator properties, we conducted a simulation study considering four different sample sizes and three different correlation values for each distribution. We achieved unbiased and consistent estimators for Poisson and NB distributions; for COM-Poisson estimators were consistent, but biased, especially for dispersion, variance, and correlation parameter estimators. These models were applied to two datasets. The first concerns a sample from 30 different sites collected in Australia where the number of times each one of the 41 different ant species was registered; which results in an impressive 820 variance-covariance and 41 dispersion parameters estimated simultaneously, let alone the regression parameters. The second is from the Australia Health Survey with 5 response variables and 5190 respondents. These datasets can be considered overdispersed by the generalized dispersion index. The COM-Poisson model overcame the other two competitors considering three goodness-of-fit indexes. Therefore, the proposed model is capable of dealing with multivariate count data, and measuring any kind of correlation between them taking into account the effects of the covariates.


Introduction
It is well known that the Poisson distribution is the most popular model to deal with count data under the framework of the generalized linear models (GLM) (Nelder and Wedderburn 1972).However, it is limited to equidispersed count data, i.e., when the mean of the response variable is equal to the variance.As alternatives to the Poisson model, the statistical literature for univariate count data is rich, either from overdispersed or underdispersed perspectives.For example, negative binomial (NB) type II distribution parameterized on mean and dispersion parameter known as the quadratic parametrization of the variance (Winkelmann 2008), hurdle and zero-inflated models (Zeileis, Kleiber, and Jackman 2008) and mixed Poisson regression (Winter and Bürkner 2021) are frequent options to model overdispersed count data; Gamma Count (Zeviani, Ribeiro Jr, Bonat, Shimakura, and Muniz 2014), Conway-Maxwell-Poisson (COM-Poisson) (Shmueli, Minka, Kadane, Borle, and Boatwright 2005) and the Extended Poisson Tweedie (Bonat, Jørgensen, Kokonendji, Hinde, and Demétrio 2018) based on the Poisson Tweedie distribution (El-Shaarawi, Zhu, and Joe 2011; Jørgensen and Kokonendji 2016) can handle all three situations (under-, equi-, over-dispersed), even though the computation of their probability mass function (pmf) relies on numerical methods.
On the other hand, the bibliography for multivariate data is scarce.However, there is an increasing demand from researchers to analyze datasets with over one response variable.The benefit of it is to better investigate the relationship between the response variables (Bonat and Jørgensen 2016); at the same time that numerical methods can take advantage of it once more data is available to estimate model parameters.Some methodologies to deal with multivariate data have been proposed and introduced by Winkelmann (2008) with a focus on building multivariate distributions.However, it comes with the price of some practical limitations.The multivariate Poisson-lognormal regression (MPLR) and the latent Poisson-normal regression models admit only overdispersed data.The multivariate Poisson-gamma mixture model (MPGM) and multivariate NB model (MNBM) are suitable only for overdispersed data with positive correlations.Another example is the copulas framework, which allows the building of multivariate distributions.But a negative correlation between many response variables is difficult to model, especially for count data (Nikoloulopoulos and Karlis 2009).Inouye, Yang, Allen, and Ravikumar (2017) proposed three alternatives to deal only with equi and overdispersed data constructed either via copulas or a mixture of distributions.
Distributions with no practical limitations were also developed.Famoye (2015) proposed a multivariate generalized Poisson regression model that can deal with any kind of dispersion and correlation with an estimation based on the maximum likelihood (ML) paradigm.In a similar way, Muñoz-Pichardo, Pino-Mejías, García-Heras, Ruiz-Muñoz, and González-Regalado (2021) proposed a multivariate conditional Poisson regression model, where the relationship between response variables is measured by a coefficient in the linear predictor.In its turn, the dependence between response variables is conditional on the other response variables.
A very flexible modeling framework based on estimating functions, the Multivariate Covariance Generalized Linear Models (MCGLM) can also fit such data (Bonat 2016).It uses only second-order moments assumptions and they estimated the parameters based on quasilikelihood (Wedderburn 1974).It can accommodate correlated data based on an approach similar to the generalized estimating equations (GEE) (Liang and Zeger 1986), allowing both multivariate responses and correlated data.We can also cite methodologies based on Bayesian inference, such as MCMC Generalized Linear Mixed Models via MCMCglmm package (Hadfield 2010) and Bayesian Regression Models using Stanbrms package (BÜrkner 2018), which comes with the price of a greater computational time.We are not going to discuss Bayesian models in this article.
Another alternative is to model the correlation between response variables in the same individual using the class of hierarchical GLM (Lee and Nelder 1996).This class allows to model of correlated variables or individuals via a random effect, an unobserved variable, that can follow any distribution.When the distribution of the random effect is Gaussian, we have the Generalized Linear Mixed Models (GLMM).However, GLMM is widely known and used to model the correlation between sample units, not for response variables, such a method is implemented in consolidated packages in software R (R Core Team 2020), such as glmmTMB (Brooks, Kristensen, van Benthem, Magnusson, Berg, Nielsen, Skaug, Maechler, and Bolker 2017), lme4 (Bates, Mächler, Bolker, and Walker 2015) and nlme (Pinheiro, Bates, DebRoy, Sarkar, and R Core Team 2017).
In this article, we propose to model multivariate count data under the framework of GLMMs to accommodate the correlation between response variables.Parameter estimates are obtained through the maximum likelihood method (Aldrich et al. 1997).We use multivariate normally distributed random effects to accommodate the correlation between response variables.The estimation process is similar to the one for GLMM and was implemented in R (R Core Team 2020) through TMB package (Kristensen, Nielsen, Berg, Skaug, and Bell 2016).
Here, we will focus on multivariate overdispersed data, whereas da Silva, Laureano, Petterle, Júnior, and Bonat (2022) presented a similar approach focused on underdispersed data.This modeling approach for multivariate count data is evaluated for three different distributions of the response variables: Poisson, NB, and COM-Poisson mean parameterized (Huang 2017).Even though COM-Poisson does not belong to the exponential family, we refer to it as a GLMM framework, once the estimation process remains the same regardless of the distribution being used.This article contains six sections, including this introduction.Section 2 describes the datasets used to provide illustrative applications of the model.Section 3 proposes the multivariate generalized linear mixed model (MGLMM) model.Section 4 shows the result of the simulation study to assess the estimators' properties.Section 5 presents the results of the model applied to the datasets presented in Section 2. Finally, Section 6 discusses the major contributions of this article and future work.

Dataset I: Australian Health Survey
The AHS is the largest survey conducted in Australia concerning health issues.The data used here is available through the mcglm package (Bonat 2018) in the ahs object.The main aim of this study is to investigate whether more access to health care services and demographic covariates such as sex, age, and income are related to the number of times patients use health services.
The data has 5190 respondents and 15 variables; of which 10 were considered covariates and 5 as count response variables of a cross-sectional study with individuals over 18 years.The five response variables are Ndoc (Number of consultations with a doctor or specialist), Nndoc (Number of consultations with health professionals), Nadm (Number of admissions to a hospital, psychiatric hospital, nursing or convalescence home in the past 12 months), Nhosp (Number of nights in a hospital during the most recent admission) and Nmed (Total number of prescribed and non-prescribed medications used in the past two days).Table 6 in the Appendix A gives the description of each covariate.
Figure 1 shows the barplot for each response variable.The figure suggests that all variables have some overdispersion, because of the right heavy-tailed and perhaps some excess of zeros.We highlight that such interpretations are marginal in the sense it does not take into account the effects of the covariates.Table 1 presents some descriptive statistics for the 5 response variables.It is seen a positive (small in most cases) correlation between the 5 response variables, which is acceptable once they were measured in the same individual; except between Nadm and Nhosp, where the correlation is 0.996.Moreover, the mean and variance relation-ship between the variables shows overdispersion for all variables, which Nhosp being the most overdispersed.This is characterized by Fisher Dispersion Index (DI) greater than 1 (Fisher 1934).Also, the generalized dispersion index (GDI) (Kokonendji and Puig 2018) classifies this dataset as overdispersed once it is greater than 1 and its 95% confidence interval does not contain zero.However, these results should be confirmed by model fitting.

Dataset II: Abundance ant species
This data was obtained from a study conducted in south-eastern Australia by Gibb, Stoklosa, Warton, Brown, Andrew, and Cunningham (2015), and it is available in the software R (R Core Team 2020) throughout the mvabund package (Wang, Naumann, Eddelbuettel, Wilshire, and Warton 2020).The study consisted of counting the number of 41 different ant species that fell into pitfall traps during 18-day sessions in 30 different sites in south-eastern Australia between November 2007 and April 2008.The primary interest of this dataset is to investigate the relationship between the environmental variables with the occurrence of different ant species and the co-occurrence of ant species.The response variables considered are the occurrence of each of the 41 species at the 30 sites, while the covariates comprise 5 environmental variables and we gave their full description in  Table 2 presents the mean, variance, and DI for every response variable and the GDI for the dataset.The variance is higher than the mean almost for all variables, except for Polyrhachis and Solenopsis; which is described by a DI>1.This dataset is classified as overdispersed based on the GDI index = 11,543 and the lower bound of a 95% confidence interval was greater than 1.

Multivariate generalized linear mixed model (MGLMM) for count data
This section follows the same description in da Silva et al. (2022) and presents the MGLMM model to completeness; even though the explanations are the same, but succinct.The fundamental difference is that in this article we will explore the model for overdispersed counts.
In summary, we extended the standard GLMM (Breslow and Clayton 1993) for multiple responses.
Let Y ij be a random variable for i = 1, . . ., n independent subjects and r = 1, . . ., k response variables.Also, consider that x irj for each subject i and response variable r is the value of the j-th covariate of k known covariates.We first specify a standard GLMM joint model with only a random intercept: where the conditional distribution of Y ir on the random effects follows a same distribution f allowing different mean and dispersion.Next, the linear predictor: g r (µ ir ) = x ⊤ irj β r + b ir , where g r (.) is a suitable link function, β r is a p×1 vector of parameter estimates and b ir is the random intercept value for subject i and response r.Finally, the random effects distribution is specified as: , where each random effect has mean 0, variance σ 2 , and correlation ρ rr ′ (r ̸ = r ′ ) between each pair of random effects.This framework is general and can be applied to any distribution f and link function g r (.).Nevertheless, as we are dealing with count data, we considered only Poisson, binomial negative type II and COM-Poisson distributions, and a logarithm link function.Extending this model for different distributions for each response (f r ) is also possible, but we will not address it.
Maximum likelihood is the estimation procedure, and it is fully described in da Silva et al. (2022).We integrated out the random effects via numerical integration using Laplace approximation (Tierney and Kadane 1986).The Newton-Raphson method was efficiently implemented to perform the inner optimization.We optimized the marginal likelihood using first-order derivatives methods, such as BFGS and OPTIM routines available in the R software.
The derivatives of the joint and marginal likelihood were obtained through automatic differentiation (Baydin, Pearlmutter, Radul, and Siskind 2017).This was efficiently implemented by the TMB package, which provides C++ templates where an objective function must be supplied, in this case, the log-likelihood function.
The variability of the random variable in this model is being measured by two parameters, the variance of the random effect and the dispersion of the pmf.A crucial point of the model is the great flexibility to learn these two types of variances with no identifiability problems.

Simulation studies
In this section, we present simulation studies to assess the properties of the MLE estimators (bias and consistency).We considered a bivariate regression model for count data.We designed 12 simulation scenarios with four different sample sizes, 100, 250, 500, and 1000, and three different correlations between random effects, ρ = −0.5, 0, 0.5.For the regression structure, we considered only an intercept for each response, with β 01 = log(7) and β 02 = log(1.5).The variance of random effects were σ 2 1 = .3and σ 2 2 = .15.The dispersion parameter for NB and COM-Poisson was equal to ϕ = 1 and ν = .7 respectively, which induces a small overdispersion.
We generated 150, 200, and 300 datasets for Poisson, COM-Poisson, and NB distribution for each design.The primary idea was to generate 100 datasets for each distribution.However, as it did not return the SE of the parameter estimates in every repetition, it was necessary to increase the number of datasets generated proportionally to the number of SE failures for each distribution to obtain at least 100 valid estimations.The following three subsection presents the results for Poisson, NB, and COM-Poisson scenarios.Figure 5 presents the coverage rate for each parameter by sample size and simulation scenario for the bivariate Poisson regression model.Overall, all empirical coverage rates are close to the nominal level of 95%, varying between 90% and 98% approximately.In special, the coverage rate of regression parameters β 0r are slightly greater than the nominal level.For the variance of random effects σ r , the coverage rate is slightly smaller than the nominal level.

Poisson scenario
Finally, for the correlation between random effects ρ, the coverage rate is slightly lower when ρ = .5,and slightly greater when ρ = {0, −0.5} compared to the 95% nominal level.
Even for the Poisson distribution, which is the simplest case to estimate because there is no dispersion parameter, there were 13 out of 1800 (3×4×150) simulations that did not return SE for some parameters estimates or produced extreme values due to large SEs that were not considered into the results.We used the PORT algorithm to estimate the model because it was more stable and faster than BFGS in most situations.It also happened in Kristensen et al. (2016), where 9 study cases from different model settings ranging from linear regression to multivariate stochastic volatility models were considered, and PORT had a better performance than BFGS.

Negative Binomial scenario
Figure 6 suggests that for all simulation scenarios, both the average bias and SE tend to be 0 as the sample size increases.This suggests the consistency and unbiasedness of the MLE estimator for large samples.Concerning the variance of the random effect σ 1 and σ 2 were slightly underestimated.

Sample size
Coverage rate Overall, Figure 7 shows that the empirical coverage rates are close to the nominal level of 95%.In particular, the coverage rate of regression parameters β 0r , the variance of random effects σ r and the correlation between random effects ρ are slightly greater than the nominal level.On the opposite, there was a coverage rate close to 80% for ρ estimated when the sample size was equal to 100 and ρ = 0. Estimation problems were more severe for the bivariate NB regression model compared to the Poisson and occurred in 740 out of 3600 (3×4×300) simulations (20,6%).It was also necessary to remove those iterations when the dispersion parameter ϕ was greater than 5 from the results (we simulated at 1).

COM-Poisson scenario
Figure 8 shows that, for all simulation scenarios, the SE tends to be 0 as the sample size increases, but the bias does not.This suggests the non-consistency of the MLE estimator.Moreover, almost every estimate is biased.While ν 2 is overestimated, ν 1 is underestimated (and the bias does not decrease with an increase in sample size).The correlation parameter ρ has a typical behavior: when the data was simulated with ρ = 0 there was almost no bias, for ρ = .5 a negative bias and for ρ = −.5 a positive bias; the model forces the correlation parameter towards zero.The standard deviation of random effect σ 2 is overestimated, while σ 1 is slightly underestimated.Regarding the regression parameters, β 02 is slightly underestimated and β 01 suggests a negligible bias.
An interesting possible relationship among parameters is that when ν was overestimated (making the model more underdispersed: ν > 1), σ was also overestimated (increasing the variance of the model).In contrast, when ν was underestimated (making the model even more overdispersed: ν < 1), σ was also underestimated (decreasing the variance of the model).It seems that when ν makes the dispersion of the model greater, σ makes the variance lower, and vice versa.It may be related because of a possible non-orthogonality between these parameters.
Overall, Figure 9 shows that empirical coverage rates are not close to the nominal level of Bias ± SE 95%, which agrees with the results presented in Figure 8, where the bias did not decrease even for a higher sample size.In particular, the coverage rate of σ 2 had the worst results among all parameters (the coverage rate decreases as sample sizes increase), followed by ν 2 .Not surprisingly, these parameters had the two largest biases in Figure 8.In contrast, results for β 01 , σ 1 and ρ (when the data was generated with ρ = 0) had coverage rates close to 95% level.
Estimation problems were more severe for the COM-Poisson regression model compared to the Poisson and less severe if compared to NB, accounting for 245 (10,2%) out of 2400 (3×4×200).
Besides the rules used for Poisson to classify extreme values, it was necessary to remove those models when the dispersion parameter ν was greater than 4 and the SE of ρ estimate was greater than 2.

Data analyses
This section presents the data analyses of the two datasets presented from section 2. Initial values for the model in section 3 were chosen carefully in the following way.The estimation process started with a simple random sample (SRS) of size 300 (seed 2390) for AHS data.The parameters obtained from the sample were used as initial values to fit the model to the whole dataset.For ANT data, we did not use this step because of the small size of the dataset (30).We then optimize the same model more than once iterating over optimizers.In the first round we used PORT routines and its final estimates were used as initial estimates to optimize the same model with BFGS; in its turn, the final estimates from BFGS were used as initial estimates to PORT; and another final round to PORT.We used this rationale because these models are complex to estimate and this procedure produced reasonable results.
Besides the full model presented in section 3, we also tried to estimate four simpler versions to check whether the model was over-specified.We made the simplifications either by fixing values from the Σ or the dispersion parameter.The first one aimed to reproduce independent response variables setting the correlation parameter to 0. In the second version, we fixed ϕ = 1 and ν = 1.5 for NB and COM-Poisson models inducing small overdispersion and underdispersion, respectively.In the third simplified model, we set the variance to 1.The last strategy used only a single variance parameter to estimate all response variables.The last three simpler versions described aimed to answer whether we can estimate both dispersion and variance parameters in the same model.We only tested them for NB and COM-Poisson models since they have dispersion parameters.
The reported results are only from the full data and the best model version for each distribution after removing covariates that did not have the SEs estimated.We then presented the maximized log-likelihood value, Akaike and Bayesian Information Criterion (AIC and BIC), and the number of parameters estimated (np).The COM-Poisson full model was the best among all distributions and the simpler versions of each distribution.

AHS data
Figure 10 presents the regression parameter estimates and 95% confidence intervals by outcome and final model.We can see that the confidence interval amplitude is almost the same for all models, but slightly smaller for the COM-Poisson models over the competitors for almost all estimates for the Ndoc response variable; for the other response variables, we saw little differences.The point estimates were very close between Poisson and NB models, with a difference for the COM-Poisson model in some estimates, for example, freepoor, age, sex, illness, and chcond limited for Ndoc.We may relate this to the bias found in Figure 8.The expected result would be ϕ and ν approaching zero, which results in an overdispersion of both distributions.However, as explored in subsection 5.2, this behavior may occur because of the variance of the random effect.For the COM-Poisson model, the only ν estimate which indicates overdispersion was for Nmed, which had a sample DI equal to 1.99 (small overdispersion).We presented the correlation coefficient estimates and their SEs in parentheses in (1) for the COM-Poisson model.Among the 10 correlation coefficients returned, COM-Poisson had 6 significant correlation coefficients.The stars in the matrix represent significant coefficients at the 5% level.Even though we do not have a primary interest in interpreting each correlation coefficient estimate, it is important to know which of them is significant, because it may be related to a smaller SE.
We can see an almost perfect significant correlation between Nadm and Nhosp, and a strong correlation between Nmed and Ndoc.We had also seen this pattern in Table 1.

ANT data
We presented the best model for each distribution in Table 5.Here, we included two models for the COM-Poisson distribution because they were equivalent in the intermediate models based on the adopted criterion.From Table 5 we can see that the best models were from COM-Poisson distribution; Poisson and NB models had similar logLik results.The COM-Poisson models had higher logLik and BIC, and smaller AIC than the fixed dispersion model.A logLik ratio test (LRT) between these two COM-Poisson models resulted in p < .00001(χ 41 = 132.84),which gives evidence in favor of the fully specified model.Therefore, we will present the estimates of the full models for each distribution.
Figure 11 presents the regression parameter estimates and 95% confidence intervals by outcome and final model for the first 12 response variables.We presented the same graphic for the remaining response variables in the Appendix B, once we found similar patterns for all response variables.First, we can see that not all response variables share the same linear predictor (covariate feral mammal dung is not presented in the second, third and ninth response variables).Second, there were still some regression estimates' SEs that were not returned by the model and are presented by a hollow circle.It was necessary to make the distinction when the SE was returned and when it was tiny (filled circle, such as bare ground for COM-Poisson and response variables 1-8 and 10-12).
Overall, the confidence intervals for the COM-Poisson model were smaller than its counterparts and the point estimates were close among all models.The feral mammal dung covariate had the largest confidence intervals compared to the other covariates.There are still regression coefficients that are very close to zero, suggesting that a better variable selection method may improve the model fit.The stars in the graphic represent the significant coefficients at the 5% level.The correlation patterns presented in this figure differ somewhat from the ones found in the marginal correlation.It shows the importance of calculating the correlation coefficient in a model that accounts for the effects of the linear predictor.For example, the sample correlation was nearly zero between responses 1 and 2, 2 and 3; while the COM-Poisson correlation between the random effect of these two response variables was negative and significant at 5% level, for NB only one was significant, and for Poisson neither was significant possibly because of a high SE, once the ρ was equal to -.7 between the random effects of Y1 and Y2, and -.54 between the random effects of Y2 and Y3.

Discussion
The focus of this article was to propose MGLMM for count data.The great advantage of this model is the ability to deal with multiple outcomes.We specified the framework based on a GLMM with a random intercept, where we structured a joint model whose random effect follows a multivariate normal distribution.As a result, we can measure the variance and correlation of the random effects motivated by the multivariate set of responses.The distributions used for variables of counts were Poisson, NB, and COM-Poisson.The estimation process is the same as a GLMM model.We used the TMB package to implement the model once it provides state-of-art C++ libraries that handle automatic differentiation, CppAD (Bell BM 2005), linear algebra, Eigen C++ (Guennebaud, Jacob et al. 2010) and parallel computation, BLAS (Blackford, Petitet, Pozo, Remington, Whaley, Demmel, Dongarra, Duff, Hammarling, Henry et al. 2002), among others.
In order to evaluate the properties of the ML estimators, we conducted simulation studies for each distribution, considering three different values for the correlation parameter and four different sample sizes.They were all evaluated through average bias and confidence interval based on the average SE and coverage rate with a nominal level of 95%.For Poisson distribution, we achieved unbiased and consistent estimators for large samples with intervals for bias ranging at most between (-.2;.2), except for sample size equal to 100 and values for the parameter ρ.The coverage rate was close to 95% in all scenarios considered, varying between 90% and 98%.We evidenced a greater variability for NB compared to the Poisson distribution, especially because of the dispersion parameter ϕ in NB, necessary to model overdispersion.For NB distribution, we also achieved unbiased and consistent estimators for large samples with bias intervals ranging in most cases between (-.5;.5).In particular, we noticed a greater confidence interval width for the correlation and ϕ 2 parameter estimates; and small bias for σ 2 and ν 2 parameter estimates.The coverage rate in most cases was equal to or greater than 95%; with a coverage rate below 80% for the correlation parameter when ρ = 0 and the sample size equal to 100.
For the COM-Poisson model, the parameter estimators were neither consistent nor unbiased.The regression parameter was underestimated for β 02 and showed no bias for β 01 .The correlation parameter ρ was always estimated towards zero: when ρ = −.5 it was overestimated, ρ = +.5 was underestimated, and for ρ = 0 it showed no bias.The standard deviation of random effect σ 2 was overestimated while σ 1 was underestimated.This behavior may be correlated to the dispersion parameter ν: when more variance was captured from σ 2 less dispersion was captured from ν 2 ; on the other hand, when less variance was captured from σ 1 , more dispersion was captured from ν 1 .
As most parameters were biased, the coverage rate was not close to 95% in all scenarios.For σ 2 , β 02 and ν r the coverage rate were between 70% and 90%; for ρ the coverage rate were close to 80% in 2 out of 3 scenarios (ρ = {−0.5,0.5}).The other parameters, β 01 , σ 1 and ρ when ρ = 0 had a coverage rate close to 95%.
After that, the two datasets were analyzed by each model and variation of them: no correlation, fixed dispersion, fixed variance, and common variance for all random effects.The first dataset was from the AHS survey with five random variables and 5190 participants.
According to the fit measures used, the COM-Poisson model provided the best fit.The SE of β and ρ estimates were similar among the COM-Poisson, NB, and Poisson models.The ν parameter captured 4 out of 5 underdispersed response variables and one overdispersed response variable, followed by a small σ.The COM-Poisson model produced more significant correlation values than its counterparts.
The second dataset was the ANT, which contains 41 response variables that count for the number of ANT species at 30 sites in Australia.The multivariate response can be considered as overdispersed by the GDI.The COM-Poisson model was also the best model regarding AIC and logLik; the model with the best BIC was the COM-Poisson with fixed dispersion.In almost all comparisons, the SEs of the COM-Poisson model was smaller than the other models.The ν parameter was greater than 1 for all response variables, indicating underdispersion.
Therefore, we suggest using the MGLMM model framework for count data.In particular, the best results were obtained with the COM-Poisson model in two real datasets.The major advantage of it is the possibility of modeling all response variables at the same time and measuring the correlation between the random effects.
The estimation process of this model was computationally intensive, being a computational challenge to implement the model.For example, the estimation of the COM-Poisson model

Figure 2 :
Figure 2: Barplot for the occurrence of each ant species genus from ANT data

Figure 3 :
Figure 3: Correlogram of ANT species occurrence using spearman correlation

Figure 4 Figure 4 :
Figure4presents the average bias and its confidence interval based on the average SE by sample size and simulation scenario for the bivariate Poisson regression model.It shows that, for all simulation scenarios, both the average bias and SE tend to be 0 as the sample size increases.This suggests the consistency and unbiasedness of the MLE estimator for large samples.

Figure 11 :
Figure 11: Regression parameter estimates and 95% confidence intervals by outcome and final model for ANT data

Figure 13 :
Figure 13: Regression parameter estimates and 95% confidence intervals for outcomes 13-24 and final model for each distribution

Figure 14 :
Figure 14: Regression parameter estimates and 95% confidence intervals for outcomes 25-36 and final model for each distribution

Figure 15 :
Figure 15: Regression parameter estimates and 95% confidence intervals for outcomes 37-42 and final model for each distribution Table7in Appendix B. The name of the response variables starts with an index number (1,...,41) followed by the abbreviated name of each species.Figure2presents the barplot for each response variable from ANT data.The 41 different species present high variability.Some species were only seen one time in a single site, such as Polyrhachis and Solenopsis, while Iridomyrmex and Pheidole were seen 20 times on over one site.

Table 2 :
Mean, variance, dispersion index (DI)for each genus for ANT data.Generalized dispersion index (GDI) and it standard error (SE) equals to 11.54(.92)

Table 2
.2 explores how the occurrence of different species is related to each other by the Spearman correlation in a correlogram.The marginal correlation ranges from -.58 up to .74, which is well distributed along all potential values of the correlation parameter ρ = (−1, 1).

Table 3 :
Table3presents the goodness-of-fit measures for AHS data from different distributions and specifications.Goodness-of-fit measures for AHS data from different distributions and specifications.Number of parameters estimated (np), Akaike and Bayesian Information Criterion (AIC and BIC), and log-likelihood value (Loglik).

Table 4
presents the dispersion estimates for each model and outcome.Even though this data can be considered as marginally overdispersed according to the GDI = 17.94 presented in Table1, we see that ϕ approaches the infinity (suggest an equidispersed model by NB distribution), and ν is greater than 1 for all response variables and indicates underdispersion.

Table 4 :
Dispersion of parameter estimates and standard errors (SEs) for each model and outcome of the Australian Health Survey (AHS) data

Table 5 :
Goodness-of-fit measures for ANT data from the best parametrization for each distribution.Number of parameters estimated (np), Akaike and Bayesian Information Criterion (AIC and BIC), and log-likelihood value (Loglik).