On Diagnostics in Bell–Touchard Regression Models

The Bell–Touchard regression model for count response variables was introduced recently in the statistical literature. This regression model for counts is, in some aspects, similar to the generalized linear model framework, and the mean response is related to covariates and unknown regression coefficients, which allows for parameter interpretation in terms of the response variable in its original scale. However, diagnostic methods for this class of regression models were not developed. In this paper, we fill this gap and propose a variety of diagnostic tools, such as leverage measures, global influence, normal curvatures of local influence under some perturbation schemes, and Pearson residuals. All diagnostic measures developed are applied in a real data set to analyze the fitted Bell–Touchard regression model in practice.

On the basis of the mean-parameterized BeTo distribution, a new regression model was introduced by Lemonte (2022) when the response variable is a count.The mean response was related to explanatory variables and regression parameters, which allows for parameter interpretation in terms of the response variable in its original scale.In the following, we briefly discuss the BeTo regression model.Let Y 1 , . . ., Y n denote n independent random variables BeTo distributed, where Y i ∼ BeTo(µ i , ϕ) for i = 1, . . ., n.Also, consider the functional relation for the mean parameter given by log(µ i ) = x ⊤ i β, where x ⊤ i = (x i1 , . . ., x ip ) are observations on p known regressors, and β = (β 1 , . . ., β p ) ⊤ ∈ R p is a vector of regression parameters (p < n).Note that µ i = exp(x ⊤ i β), which ensures that µ i > 0 for all i = 1, . . ., n.In general, we have that x i1 = 1 (for all i = 1, . . ., n), and so β 1 corresponds to the intercept.Lemonte (2022) showed that the maximum lilekihood (ML) estimates β = ( β 1 , . . ., β p ) ⊤ and ϕ of β = (β 1 , . . ., β p ) ⊤ and ϕ, respectively, can be obtained by iteratively solving simultaneously the equations where , and Also, tr(•) denotes the trace operator, and m = 0, 1, . . .means the iteration counter in the iterative process (1).Let θ = ( β ⊤ , ϕ) ⊤ be the ML estimate of θ = (β ⊤ , ϕ) ⊤ .Asymptotic properties of the ML estimators of the BeTo model parameters were also discussed by Lemonte (2022), and the author showed that θ ∼ N p+1 (θ, K −1 θ ), where K θ = diag{K β , K ϕ } and so β , 1/K ϕ }, and K β = X ⊤ V X and K ϕ = tr(P ).It is interesting to note that β and ϕ are globally orthogonal (Cox and Reid 1987) and, hence, the ML estimators β and ϕ are asymptotically independent.From the above asymptotic distribution, we have that VAR( β) = (X ⊤ V X) −1 , and VAR( ϕ) = 1/tr(P ), and so asymptotic confidence intervals for the BeTo model parameters can be computed in a usual manner.The reader is referred to Lemonte (2022) for more details about the BeTo regression model.
It is evident that the parametric BeTo family of discrete distributions corresponds to an interesting, tractable, and useful distribution for count data.In particular, real data illustrations presented in Lemonte (2022) have indicated that the BeTo regression model is very useful in practice and can be an interesting alternative to the well-known Poisson and negative binomial regression models.On the other hand, Lemonte (2022) has not studied diagnostic techniques in order to detect possible influential observations under the BeTo regression model.These possible influential observations may change, for example, the inference on a particular regression parameter of the BeTo regression model fitted to the data at hand.Therefore, it is quite important to consider diagnostic techniques to identify these observations under the BeTo regression model.This paper fills this gap and develops diagnostic methods for this regression model.First, we shall derive a leverage measure to identify observations with high influence on their own predicted values.Next, we shall consider the global influence starting from the case deletion proposed by Cook (1977), and then we consider the local influence method (under different perturbation schemes) developed by Cook (1986) to detect possible influential observations under the BeTo regression model.It is worth stressing that these diagnostic methods are quite applied in the statistical literature, mainly in parametric regression models, to identify possible influential observations; see, for example, Svetliza and Paula (2001), Paula (2013), Ibacache-Pulgar, Paula, and Galea (2014), Zhu, Shi, and Liu (2015); Zhu, Liu, and Shi (2016), Ferreira, Paula, and Lana (2022) and Aoki, Bustamente, and Paula (2022), among many others.We shall also define the Pearson residuals to assess departures from the assumptions made for the BeTo regression model, particularly for the error assumptions, as well as to detect outlying observations.Finally, the diagnostic procedures developed in this paper are applied in a real data set to analyze the fitted BeTo regression model.

Leverage
Here, we follow Pregibon (1981) and provide a leverage measure for the BeTo regression model.Basically, the idea behind the concept of leverage is that of evaluating the influence of each observed response on its own predicted value.At the convergence of the iterative process (1), we have that Therefore, the ML estimate β can be interpreted as the least squares solution of the linear regression of V 1/2 z against the columns of V 1/2 X.The projection matrix of the least squares solution of the linear regression of z against X with weights V is given by which suggests the use of the elements of the main diagonal of H, say h ii , to detect the presence of leverage points in this weighted linear regression model (Pregibon 1981).Note that h ii can be expressed as Hence, we use h ii as a leverage measure for the BeTo regression model.Therefore, the plot of h ii against the index of the observations may reveal observations with a high influence on their own predicted values.In addition, observations with large values for h ii may also be influential on the estimated regression coefficients (see, for example, Svetliza and Paula 2001).

Global influence
In the following, we consider the case deletion approach (Cook 1977) to assess the influence of the i-th case on the ML estimate θ = ( without the i-th observation in the sample.The global influence measure, i.e. the (generalized) Cook distance, to study the effect of dropping the i-th case from data on the ML estimate θ = ( β ⊤ , ϕ) ⊤ can be defined by It is worth stressing that one can avoid the estimation of θ ⊤ for all i = 1, . . ., n by employing the one-step approximation given by θ (−i) ≏ θ − K −1 θ l i , where and Note that the total log-likelihood function is simply given by ℓ(θ) = n i=1 ℓ i (θ).We have that ∂ℓ where R i is the Pearson residual given by Thus, we have that where and so It then follows that It is immediate to note that we can separate this distance into two components: , where C i (β) corresponds to the β-component, and C i (ϕ) corresponds to the ϕ-component, with Hence, the plot of C i (θ), C i (β) and C i (ϕ) against the index of the observations may reveal possible influential observations on the ML estimates θ = ( β ⊤ , ϕ) ⊤ , β and ϕ, respectively.It is interesting to note that C i (β) = R 2 i h ii , and so a case with a large value of Pearson residual (in absolute value) and/or a large value for h ii may be an influential observation on the ML estimate β = ( β 1 , . . ., β p ) ⊤ .

Local influence
As early mentioned, the total log-likelihood function of the BeTo regression model is given by ℓ(θ) = n i=1 ℓ i (θ), where θ = (β ⊤ , ϕ) ⊤ is the (p + 1)-vector of parameters, and ] denotes the contribution of the i-th observation.Let ℓ(θ|ω) denote the perturbed log-likelihood function under a particular perturbation scheme applied in the model or data, where ω is a q-vector of perturbations restricted to some open subset Ω ⊂ R q .Let ω 0 ∈ Ω denote the no-perturbation vector so that ℓ(θ|ω 0 ) = ℓ(θ).To measure the discrepancy between ℓ(θ|ω) and ℓ(θ), one can consider the likelihood displacement given by LD(ω) = 2[ℓ( θ) − ℓ( θ ω )], where θ ω denotes the ML estimate of θ obtained from the perturbed log-likelihood function ℓ(θ|ω).The local influence method developed by Cook (1986) considers the influence of small perturbations in the model or data on the measure LD(ω).Basically, the main idea of the local influence approach is to study the normal curvatures for β and ϕ in the unitary direction, d say, available at θ = ( β ⊤ , ϕ) ⊤ and ω 0 .Such curvatures, for large n, can be expressed, respectively, in the forms (Cook 1986) where K β = X ⊤ V X, K ϕ = tr( P ), and Note that ∆ β and ∆ ϕ are p×n and 1×n matrices, respectively.These matrices are evaluated at the ML estimate θ = ( β ⊤ , ϕ) ⊤ , and at the no-perturbation vector ω 0 .

Residual analysis
The aim of the residual analysis is to identify departures from the model assumptions, as well as to assess the overall goodness-of-fit of the fitted model to the data at hand.A simple as well as natural residual is the well-known Pearson residual early mentioned, which is based on the idea of subtracting off the mean and dividing by the standard deviation, that is, Note that E(R i ) = 0, and VAR(R i ) = 1.However, in practice, we have to replace the unknown quantities with the estimated ones, and so we have that It is worth emphasizing that the exact distribution of the Pearson residual R i is not known under the BeTo regression model.However, as suggested by Atkinson (1981), we may add envelopes into the quantile-quantile plot (QQ-plot) of the Pearson residuals to decide whether the observed residuals are consistent with the fitted BeTo regression model, as well as to detect outlying observations.Hence, if some residual values fall outside the simulated envelope, there is evidence against the adequacy of the fitted BeTo regression model to the data.

Application
As illustration, we shall consider a data set from Brockmann (1996).The data correspond to a study on horseshoe crabs (Limulus polyphemus).Let the number of males surrounding a breeding female (Y ) be the response variable; that is, the number of additional males ("satellites") residing nearby each female that may fertilize her eggs.Explanatory variables are the female crab's color (1-light; 2-medium; 3-dark; and 4-darker); the female crab's spine condition (1-good; 2-middle; and 3-bad); and the female's weight (in Kg).The sample size is n = 173.Boxplots for the satellites according to the female's color and female's spine condition are presented in Figure 1.Note that there are a few extreme observations.We assume Y ijk ∼ BeTo(µ ijk , ϕ) so that log(µ ijk ) = β 1 + β 2 weight i + β 3j + β 4k , where µ ijk is the expected number of additional males ("satellites") corresponding to the j-th female's  color (j = 1, 2, 3, 4) and k-th female's spine condition (k = 1, 2, 3) for i = 1, . . ., 173.Here, we assume that β 31 = β 41 = 0.The ML estimates of the regression parameters, the corresponding asymptotic standard errors (SE), and the p-values are listed in Table 1.The ML estimate of the precision parameter (asymptotic SE between parentheses) is ϕ = 0.1042 (0.0376).The QQ-plot for the Pearson residuals with simulated envelopes (Figure 2) against the theoretical quantiles of the standard normal distribution does not present any unusual features, as well as there are no observations outside the envelope.Therefore, the BeTo regression model seems to be appropriate to model the horseshoe crabs data.From Table 1, note that only the female's weight seems to be statistically significant (based on the 5% nominal level, for instance) on the expected number of additional males ("satellites") that may fertilize her eggs.
We now check the presence of possible influential observations.To do that, we generate plots for the diagnostic measures derived in the previous section.Figure 3 presents the index plot of the leverage measure, where case 141 seems to have a great influence on its own predicted value.This case corresponds to a breeding female crab with 5.2Kg (being the weightiest female crab), medium color, and good spine condition, which shows the difficulty of predicting the response for weigher female crabs.Index plots of C i (β) and C i (ϕ) are displayed in Figure 4. Figure 5 presents the index plot of the local influence under the case-weight perturbation, while Figure 6 presents the index plot of the local influence under the covariate (female crab weight) perturbation.On the basis of these plots, the cases 15,56,117,134,141,146 and 149 have more pronounced influence than the other observations.It is worth mentioning that a common feature among these cases is that all correspond to female crabs which have a great number of males surrounding them.For example, cases 15, 56, and 117 have 14, 15, and 12 males ("satellites") residing nearby the female, respectively, and the sample mean of males is (approximately) 3.
After finding possible influential observations from the generated plots for the diagnostic measures derived in this paper, an interesting problem is to determine how to deal with these influential observations, that is, how to measure the effect of these observations on the parameter estimates or on the fitted model, for example.First, we have dropped each of the outstanding influential observations (cases 15, 56, 117, 134, 141, 146, and 149) and have refitted the BeTo regression model to verify their impact on the parameter estimates.Table 2 lists the relative changes of the ML estimates after removing each of the influential observation from the sample.(The relative change (in %) of an estimate ξ of ξ is given by ( ξ − ξ (−i) )/ ξ × 100%, where ξ (−i) denotes the ML estimate of ξ after removing the i-th observation from the sample.)It is noteworthy that the intercept parameter was the most affected by the influential observations, followed by the parameters regarding the spine condition effect (i.e., β 42 and β 43 ).For example, the elimination of case 15 delivers relative changes (approximately) of −169%, 78%, and −177% for β 1 , β 42 , and β 43 , respectively.However, the elimination of these observations does not change the inference regarding the regression coefficients; that is, the covariate which is statistically significant remains significant, and the covariate that is not statistically significant remains not significant.Next, Table 3 lists the log-likelihood function evaluated at the ML estimates ( ℓ), the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the Hannan-Quinn information criterion (HQIC) after dropping each of the outstanding influential observations (cases 15, 56, 117, 134, 141, 146, and 149).From Table 3, note that each of the outstanding influential observations has a slight impact on the BeTo regression model fitted to the data at hand; that is, the comparison criteria are near, and there is no notable improvement on the model fitting after removing each of the outstanding influential observations in comparison which the fitted model with all observations and, hence, the BeTo regression model can incorporate these outstanding influential observations.

Concluding remarks
The two-parameter discrete Bell-Touchard family of distributions introduced by Castellares  Lemonte (2022), and its associated regression model for count response variables, has proved to be useful to deal with overdispersion in count data.
We have derived useful diagnostic methods for the Bell-Touchard class of regression models.
It is worth stressing that all diagnostic measures developed in this paper are simple, compact, and can easily be implemented in statistical software, once only simple matrix operations have to be performed.We have also considered the diagnostic measures developed in a real data set to illustrate their usefulness in finding possible influential observations when using the Bell-Toucherd regression model in practice.Finally, we have developed R codes (Team 2022) to perform the diagnostic measures regarding the Bell-Touchard regression model in Section 3. The R code may be obtained upon request.
Sd|, where C ϕ = 1/tr( P ).The sensitivity of the ML estimates β = ( β 1 , . . ., β p ) ⊤ and ϕ under the case-weight perturbation scheme can assessed by considering the largest directions d = d max in C d (β) and C d (ϕ), which are given by the eigenvectors corresponding to the largest eigenvalues of the matrices R

Figure 1 :
Figure 1: Boxplots of the satellites according to the female crab's color (left), and female crab's spine condition (right)

Table 2 :
Relative changes (in %) on the parameter estimates