Analyzing Overdispersed Antenatal Care Count Data in Bangladesh: Mixed Poisson Regression with Individual–Level Random Effects

Poisson regression (PR) is commonly used as the base model for analyzing count data with the restrictive equidispersion property. However, overdispersed nature of count data is very common in health sciences. In such cases, PR produces misleading inferences and hence give incorrect interpretations of the results. Mixed Poisson regression with individual–level random effects (MPR ILRE) is a further improvement for analyzing such data. We compare MPR ILRE with PR, quasi-Poisson regression (Q PR) and negative binomial regression (NBR) for modelling overdispersed antenatal care (ANC) count data extracted from the latest Bangladesh Demographic and Health Survey (BDHS) 2014. MPR ILRE is found to be the best choice because of its minimum Akaike information criterion (AIC) value and the overdispersion exists in data has also been modelled very well. Study findings reveal that on average, women attended less than three ANC visits and only 6.5% women received the World Health Organization (WHO) recommended eight or more ANC visits for the safe pregnancy and child birth. Administrative division, place of residence, birth order, exposure of media, education, wealth index and body mass index (BMI) have significant impact on adequate ANC attendance of women to reducing pregnancy complications, maternal and child deaths in Bangladesh.


Introduction
In many practical fields, such as public health, epidemiology, insurance, demography, psychology and actuarial studies, count data are often a very common phenomenon. Initially for analyzing these data, PR is used as the base or standard model with the equality assumption of mean and variance of count responses. However, in many real count data analyses, this assumption is often violated because of greater variability i.e., higher variance than mean. This is well-known as overdispersion. Ignorance of the consequences of overdispersion in modelling count data gives misleading inferences and interpretations about regression parameters. Thus, in case of analyzing count data, one should take much care to handle the overdispersion, otherwise estimates obtained from model fitting may not be precise and consistent. Q PR and NBR models may be alternatives to deal with overdispersed nature of count data instead of PR model (Hilbe 2011;Ver Hoef and Boveng 2007). Moreover, individual-level random effect (ILRE) models the extra-Poisson variation present in the data effectively via generalized linear mixed models (GLMMs) framework using a single-level of the random effect for each data value (Harrison 2014).
Maternal health care during pregnancy is an important focused issue for the development of a country. In many developing countries, like Bangladesh, millions of women experience a lot of life-risk difficulties during their pregnancy. Pregnancy and childbirth complications are the major causes of deaths and disabilities among women in Bangladesh (Chowdhury, Islam, Chakraborty, and Akhter 2007;Latif, Hossain, and Islam 2008;Mohammad, Zahura, and Rahman 2017;Islam and Sultana 2019). While such complications are very common among pregnant women in Bangladesh, very limited attention or effort has been given by the concerned authorities and researchers in this area to reduce these life-threatening complications by confirming satisfactory ANC attendance during the whole course of pregnancy.
About 810 maternal deaths were reported worldwide every day in 2017 due to major complications experienced by women during their pregnancy and childbirth. Unfortunately, 94% of all these maternal deaths occurred in developing countries (WHO 2019). Between 2000 and 2017, the maternal mortality ratio (MMR), defined by number of maternal deaths per 100,000 live births, declined by about 38% globally. In 2017, the estimated MMR was 462 maternal deaths in developing countries whereas this value was found only 11 for developed countries (WHO 2019). However, in developing countries, like Bangladesh the situation has not enriched throughout this time period. In current years, the MMR is observed to be steady in Bangladesh, because the estimated value of MMR in 2016 was 196 maternal deaths, nearly identical as the estimated MMR found in 2010 (NIPORT, ICDDRB, MEASURE Evaluation 2017). By 2030, one of the main targets of the sustainable development goals (SDGs)-3 is to drop the MMR to below 70 maternal deaths (WHO 2018) and this is undoubtedly a huge challenging problem.
ANC assists mother to be prepared for the risk-free delivery and childbirth by reducing pregnancy-and-delivery associated major complications. The utilization of ANC services, especially adequate ANC visits play an important role to reducing such complications and hence declining maternal and child deaths (Pandit 1992;Titaley, Hunter, Heywood, and Dibley 2010). WHO (2007) suggested that women should receive at least four ANC visits throughout the pregnancy to confirm their safe motherhood under usual conditions. However, this requirement was not sufficient to reducing pregnancy complications effectively and hence to give safe childbirth. As a result WHO (2016) updated and further recommended at least eight ANC visits for women during the whole course of pregnancy: one visit within first three months (first trimester) of pregnancy, two visits between three to six months (second trimester), and five visits in the third trimester (WHO 2016). However, the current scenario in Bangladesh reflects that women do not ensure this minimum target of ANC visits for their safe motherhood (NIPORT, Mitra & Associates, ICF International 2016).
The utilization of ANC attendance among women throughout the pregnancy is associated with different socio-economic and demographic variables. The frequency of ANC visits of women was highly affected by their age at first birth (Nisar and White 2003). The rate of ANC visits of pregnant women was greater among higher-economic groups than others (Jayaraman, Chandrasekhar, and Gebreselassie 2008). Women took significantly higher ANC visits during their first birth of pregnancy than the second or above births (Navaneetham and Dharmalingam 2002). Women living in urban areas were more likely to attend ANC visits than women from rural areas in Bangladesh (Rahman 2009). Women education was also found to be the significant factor for ensuring adequate ANC visits, antenatal morbidity, and also to reducing child and maternal deaths (Becker, Peters, Gray, Gultiano, and Black 1993;Mohammad et al. 2017;Chakraborty, Islam, Chowdhury, Bari, and Akhter 2003;Chowdhury et al. 2007). A study on ANC attendance among Bangladeshi women, who started to visit health centre during their pregnancy in order to receive antenatal care, reported that the urban and educated women who belonging to rich family and having access to mass-media were more likely to attend adequate ANC visits compared with their counterparts (Hossain, Akter, Sultana, and Kabir 2020).
It is evident from the latest BDHS 2014 that Bangladeshi women do not complete the required minimum standard number of ANC visits, recommended by WHO (2007WHO ( , 2016, for their safe pregnancy and childbirth (NIPORT, Mitra & Associates, ICF International 2016). A number of studies have been conducted to identify the associated risk factors for the utilization of adequate ANC attendance in Bangladesh (Sultana and Bari 2017;Islam and Masud 2018a,b;Rahman and Hossain 2019;Bhowmik, Das, and Islam 2020). The consequence of overdispersion exists in ANC count data of women in Bangladesh, extracted from BDHS 2014, was ignored in most of the previous studies while modelling such data. However, this has been considered in the present study to avoid misleading inferences and interpretation of results. Therefore, overdispersion has been detected and tested to investigate whether it is statistically significant or not in the data set used in this paper. We then aim to analyze the data adopting suitable statistical modelling in order to explore the possible significant factors for the WHO recommended satisfactory number of ANC attendance among women to receiving antenatal health care during pregnancy in Bangladesh.

Data and sampling design
We used the latest survey data on ANC among women during pregnancy in Bangladesh extracted from BDHS 2014 (NIPORT, Mitra & Associates, ICF International 2016). In this survey, a two-stage stratified random sampling design was used in order to collect data. In the first stage, 600 enumeration areas (EAs) were considered with probability proportional to the size of EAs where 207 and 393 EAs are from urban and rural areas, respectively. A sampling frame of households in the selected EAs was then prepared. On average, 30 households were taken from each EA in the second stage of sampling. Finally, 18000 households were considered for the interview in this survey. The detailed description of data and survey can be found at the website: https://dhsprogram.com/data/available-datasets.cfm.

Variables
In this study, we aim to find out the potential determinants for adequate ANC attendance, recommended by WHO, among women during their pregnancy in Bangladesh. The number of ANC visits of Bangladeshi women to receiving health care during their pregnancy is considered as the count response variable. Administrative division (region), place of residence, birth order, mother's age at birth, exposure of media, education, wealth index, membership of nongovernment organization (NGO), women empowerment, education gap between husband and wife, religion, mother having son and BMI are considered as explanatory variables.
Information on all of the selected variables were not found straightforward from the survey data. For example, the variables: exposure of media, NGO membership and women empowerment are prepared by combining the related covariates. The women who read magazines or newspapers or watch television or listen to radio at least once per week are considered as the exposed group to mass media. If women who were involved with any one of the NGO's: Grameen Bank, Proshika, Bangladesh Rural Development Board (BRDB), Bangladesh Rural Advancement Committee (BRAC) and Association of Social Advancement(ASA) then they were considered to have the NGO membership otherwise not. If women were associated with any one of the decisions such as: buying major household goods, own and children health care, and visiting relatives or family then women were treated to be empowered. A data set of 4390 women information is finally obtained after deleting all the missing cases for this study.

Overdispersion in count data analysis
Overdispersion is a common phenomenon for modelling count response data which arises due to the excess variability i.e., variance of the responses is greater than the mean in a PR model. This is the case of ANC attendance count data considered in this study. One should take much care about the consequence of overdispersion in analyzing count data as the ignorance of this gives incorrect interpretations of the model parameters.

Detection
One may detect the overdispersion at the first stage whether it is present or not in a real data set by using Pearson chi-square (χ 2 ) statistic (Hilbe 2011), defined by where V is the variance function andμ i is the expected counts. If the dispersion value, Pearson residual χ 2 statistic divided by the corresponding degrees of freedom, is greater than 1 then the model is considered as overdispersed, while the value of dispersion is 1 for equidispersed; and less than 1 for underdispersed.

Tests
The amount of overdispersion occurred in the data, whether it is statistically significant or not, can be assessed by both the score and Lagrange multiplier tests. Three different versions of the score test are given by Dean and Lawless (1989): Cameron and Trivedi (1990): and Winkelmann (2008): Moreover, the statistic for Lagrange multiplier test (Hilbe 2011) of overdispersion can also be written as

Models
We consider three different generalized liner models (GLMs): PR, Q PR and NBR (McCullagh and Nelder 1989;Dobson and Barnett 2018) and an extended GLMM: MPR ILRE (Harrison 2014) in order to select the best choice for analyzing overdispersed ANC count data of women in Bangladesh. PR is the first step for modelling count data with the equality assumption of mean and variance of the responses. Q PR and negative binomial regression (NBR) are alternatives as both models incorporate the overdispersion for analyzing overdispersed count data. Let Y be a count response variable. It follows that the expectation of Y is E(Y ) = µ > 0.
For the Q PR model (Ver Hoef and Boveng 2007) where var(Y ) is the variance of Y , V is the variance function as before and θ > 1 is the overdispersion parameter However, in case of NBR model, the variance of Y can be expressed as where κ > 0. The overdispersion i.e., the excess amount of µ in (7) is (1 + κµ), multiplicative factor, which depends on µ. It is to be noted that the variance function V is a linear function of µ for the Q PR whereas this is a quadratic function for the NBR model.
Let the mean µ i > 0; i = 1, . . . , n for the i th woman vary as a function of covariates for that woman and it can be modelled in the context of GLMs framework (McCullagh and Nelder 1989;Dobson and Barnett 2018) as where g and η i are the log-link function and i th linear predictor, respectively. Equation (8) can be rewritten in matrix notation as where µ is the mean vector, X is the n×p design matrix with p = k+1, is the i th row vector of X, η is the vector of linear predictors and β = (β 0 , β 1 , β 2 , ..., β k ) is the p × 1 vector of regression coefficients. For the further improvement of modelling count data, one may consider MPR ILRE in the context of GLMMs (Stroup 2012) by introducing random effects in the linear predictor of (9) as where µ = E(Y |u) is now the conditional mean vector of responses given random effects, Y is a n × 1 vector of responses and u is a n × 1 vector of individual-level random effects. The components of u are assumed to be uncorrelated and normally distributed for the computational simplicity. We implement maximum likelihood estimation algorithm using the Laplace approximation for model parameters in the MPR ILRE via the R-package: glmmTMB (Magnusson, Skaug, Nielsen, Berg, Kristensen, Maechler, Bentham, Koen, Sadat, Bolker, and Brooks 2020). We use AIC as the model selection criterion and can be written as (Akaike 1998) where p is the number of regression parameters to be estimated, l(θ; y) is the log-likelihood, θ is the vector of estimated model parameters and y is the data vector. A model with the smallest AIC value is considered as the best choice for analyzing the overdispersed ANC count data of women in Bangladesh. It is useful to consider incidence rate ratio (IRR) instead of regression of coefficients for interpretations and better understanding of results in order to examine the impact of explanatory variables on the count response variable. For the specific covariate x j ; j = 1, . . . , k the IRR is estimated as IRR j = eβ j whereβ j is the j th estimated regression coefficient. Table 1 shows the frequency and percentage distribution of ANC visits of total 4390 women during their last pregnancy three years prior to the survey date. It is observed that 21.3% women did not visit any health centre to receive ANC during their whole period of pregnancy. According to WHO (2007) guideline, it is the minimum standard requirement for the pregnant women to have at least four ANC visits for the safe motherhood, whereas this figure was 32.2% for Bangladeshi women during their pregnancy. It can also be seen that a significant portion of participants i.e., 93.5% women visited less than eight times to a medically trained ANC health centre and hence, it follows that only 6.5% women attended the WHO (2016) further recommended eight or higher ANC visits in order to give the safe pregnancy and childbirth. Moreover, the women took on average only less than three (mean=2.86) ANC visits. The variance (14.79) of ANC count responses was also found to be substantially higher than its mean. The percentage distributions of women among different categories of socio-economic and demographic variables are presented in Table 2. Out of seven administrative divisions in Bangladesh, 51.9% of women were from three major divisional cities, namely Chittagong (19.2%), Dhaka (17.7%) and Sylhet (15.0%). The percentage of women was very close for other regions as Barisal (11.9%), Khulna (11.7%), Rajshahi (12.2%) and Rangpur (12.3%). Most of the participants were muslim women (91.9%), living in rural areas (67.9%) and empowered to give opinion for important decisions in their family (80.3%). The percentages of women having media exposure (62.2%), higher education gap (82.3%), son (67.8%) and not having any NGO membership (67.9%) were substantially higher than their counterparts. Most of the women were below or equal to aged 35 years (95.9%) and 31.3% of them were aged below 20 years; had the third birth or below (86%) where 40.5% had the first birth during the survey period. Out of total respondents, 13.3% women had no educational attainment and 86.7 % had at least primary level of education. About 40% of women were from the poor family and 41.4% mothers had either under-weight (24.5%), over-weight (14.1%) or obesity (2.8%).

Results and discussion
The analysis of variance (ANOVA) test was performed to examine whether there exists any significant variation among average ANC visits received by women of different associated groups of the covariates. The summary results are given in Table 2. From p-values in Table 2, it is observed that there was a highly significant association between average number of ANC visits and place of residence (p<0.001), birth order (p<0.001), exposure of media (p<0.001), education (p<0.001), wealth index (p<0.001), women empowerment (p=0.002) and BMI (p<0.001) of pregnant women in Bangladesh. It can also be seen that administrative division (p=0.012) and mothers having son (p=0.044) were significantly associated with mean ANC visits of women at 2% and 5% level, respectively. These nine significant covariates have been considered later in adjusted multivariate analysis for investigating potential determinants affecting ANC attendance among women during pregnancy in Bangladesh. From Table 2, it is seen that the highest mean number of ANC visits was 3.64 for women of Khulna region and this value belongs to 95% confidence interval (CI):3.21-4.07. On the other hands, the lowest average number of ANC visits (mean=2.03, 95% CI: 1.86-2.21) was found among women in Sylhet region. It can also be seen that women living in urban areas of Bangladesh experienced higher average number of ANC visits (mean=3.82, 95% CI:3.58-4.05) than rural areas (mean=2.42, 95% CI:2.29-2.54). The average ANC visits during pregnancy of women at their first birth was higher (mean=3.26, 95% CI:3.07-3.45) and lower at above their 3rd birth (mean=1.69, 95% CI:1.53-1.85). Women aged below 20 years at birth attended the highest mean ANC visits (mean=2.88, 95% CI:2.65-3.11) while the lowest value was for women aged above 35 years at birth (mean=2.67, 95% CI:1.58-3.77).
The mean number of ANC visits of women exposed to media was greater (mean=3.48, 95% CI:3.32-3.63), whereas it was lower for women who were not (mean=1.85, 95% CI:1.70-2.00). Mean number of ANC visits during pregnancy was found to the lowest (1.48) (mean=1.48, 95% CI:1.32-1.63) among illiterate women and this figure was the highest for higher educated women (mean=4.88, 95% CI:4.44-5.32). The average ANC visits among women increased with their increasing levels of education. From Table 2, it is also evident that women belonging For modelling and multivariate analysis of ANC count data, we recall that the variance of response variable (number of ANC visits) was greater than mean (Table 1). It follows immediately that the overdispersion should be taken under consideration in the data analysis. This was done by detecting the overdispersion first and then testing whether it is statistically significant or not. In order to detect the presence of overdispersion, we first fitted PR model and computed the value of dispersion by using Pearson χ 2 statistic. This value was 4.36 (greater than 1) which clearly indicates the existence of overdispersion in ANC count data and can be found in Table 3. From p-values of both score (p = 0.008) and Lagrange multiplier (p < 0.001) tests in the right panel of Table 3, it is evident that the presence of overdispersion was statistically highly significant and the fitted PR model is overdispersed. We now compared performances of different regression models in order to select the best one for analyzing the overdispersed ANC attendance count data of women in Bangladesh. The AIC and Dispersion values were computed for different models: PR, Q PR, NBR and MPR ILRE and given in the left panel of Table 3. It is seen that AIC value (18049) is minimum and Dispersion value (0.49) is also close to 1 for MPR ILRE compared with other models. It follows that MPR ILRE is the best choice for analyzing ANC visits count data because of its smallest AIC value and the overdispersion has also been captured very well by MPR ILRE. Finally, determinants of ANC visits among women during their pregnancy in Bangladesh have been identified by fitting the MPR ILRE model and the results are summarized in Table 4.  access to media was 24% (p <0.001, IRR=1.24, 95%CI:1.126-1.348) was higher than women who do not have any access to mass-media. There is a significant positive impact of exposure of media on ANC visits of women during their pregnancy period in Bangladesh.

Conclusion
In this study, we used data obtained from the latest BDSH 2014 survey for estimating the current scenario of ANC utilization and its potential determinants of women during their pregnancy in Bangladesh. It was estimated that only 6.5% women received the WHO (2016) recommended eight or higher ANC visits during their pregnancy. It was also found that women received on average less than three ANC visits during their whole course of pregnancy. The variance of number of ANC visits was found to be higher than mean and hence the equidispersion assumption of Poisson regression was violated. We detected and tested the amount of overdispersion present in the data and found to be statistically significant. Thus, we suggest that one should account for the consequence of overdispersion giving high importance to assess the estimates of regression parameters precisely in the count data analysis. Study findings reveal that the performance of MPR ILRE model was comparatively higher than other three GLMs: PR, Q PR and NBR in the presence of substantial overdispersion to ANC data of women in Bangladesh. Thus, the significant factors affecting adequate ANC visits of women were determined by using the selected MPR ILRE model in multivariate analysis.
The findings of this study showed that administrative division or region, place of residence, birth order, exposure of mass-media, education, wealth index and BMI of Bangladeshi women have significant effect on ANC visits during their pregnancy. Women living in urban areas, belonging to rich family, with higher educational attainment, and having mass-media exposure received more ANC visits for the safe pregnancy. According to the findings of this study, significant steps should be taken to aware and motivate women for taking higher ANC visits during the pregnancy period irrespective of their birth orders. Necessary efforts should also be taken to educate women in Bangladesh, to ensure easy access to mass-media and to take care of their weight, should be facilitated for better antenatal care in Bangladesh. More precisely, the concerned authorities need to provide appropriate health care facilities in rural areas and women who belonging to poor families for reducing pregnancy complications and hence to reduce maternal and child mortality in Bangladesh.