Assessing the Goodness of Fit of the Gompertz Model in the Presence of Right and Interval Censored Data with Covariate

This research focuses on assessing the goodness of fit for the Gompertz model in the presence of right and interval censored data with covariate. The performance of the maximum likelihood estimates was evaluated via a simulation study at various censoring proportions and sample sizes. The conclusions were drawn based on the results of bias, standard error and root mean square error at different settings. Following that, another simulation study was carried out to compare the performance of the proposed modifications to the Cox-Snell residuals for both censored and uncensored observations at different combinations of sample sizes and censoring levels. The results show that standard error and root mean square error values of the parameter estimates increase with the increase in censoring proportions and decrease in the number of sample size. This indicates that the estimates perform better when sample sizes are larger and censoring proportions are lower. The performance of the proposed modifications of the Cox-Snell residuals showed that they perform slightly better than existing method.


Introduction
Survival analysis consists of statistical procedures used for analysis of data where the outcome variable is time until an event occurs and is often referred to as time to event data. Survival analysis has become a popular tool in observational and experimental studies primarily in the public health, epidemiology, medical and biological sciences (Klein and Moeschberger 2003;Lee and Wang 2003). Traditional statistical procedures are not equipped to handle the censored observations which is a special type of missing data that occurs in survival analyses when subjects do not experience the event of interest during the follow-up time. Moreover, survival data are not symmetrically distributed or non-normality and typically it will tend to be positively skewed (Collett 2003).
Censoring occurs when we have some information about individual survival time but do not know the survival time exactly. While, for exact or uncensored observations reported when the survival times recorded for the person's that died during the study period which is the times from the start of the experiment until the death. The three most common censoring in survival analysis are right censoring, left censoring and interval censoring. Right censoring occurs when true survival time is equal to or greater than observed survival time or in other way, we can say that the individual is still alive at a given time. Left censoring arises when the individual has experienced the event of interest prior to the start of the study but the exact time of occurrence is unknown. In interval censoring, the true survival time lies within a known time interval instead of being observed exactly (Klein and Moeschberger 2003;Lee and Wang 2003;Kleinbaum and Klein 2012). Interval censored data is very common in medical research where inspection on patients are conducted on different time intervals. So, the lifetime is only known to fall within an interval, L i < t i < R i , where L i and R i are known as left and right endpoints. In this study, we focus on both right and interval censoring.
Although there are well known methods for estimating unconditional survival distributions, most interesting survival modeling examines the relationship between survival and one or more predictors known as covariates. Residuals are a widely used tool to assess the adequacy of a model. When modeling survival data, it is not as easy to define a residual as for a general linear model. It is common practice to use Cox-Snell residuals to check for overall goodness of fit in survival models (Cox and Snell 1968). Therefore, a set of different residuals has therefore been proposed.
In this paper, we have considered the Gompertz distribution with covariate in the presence of right and interval censored data to study extensively on the performance of this model. A simulation study is carried out to evaluate the maximum likelihood estimation (MLE) procedure for the parameters of the Gompertz model at various censoring proportions and sample sizes by computing the values of bias, standard error (SE) and root mean square error (RMSE). Following that, we had proposed several modifications to the Cox-Snell residuals and analyzed comprehensively their performance at different sample sizes and censoring proportions.
Originally, the Gompertz distribution was developed by a British actuary, (Gompertz 1825), in modeling human mortality and establish actuarial tables. This famous Gompertz theoretical law of mortality states that the death rates increased exponentially with age. Over the past one and a half centuries, many researchers have contributed to the studies of statistical methodology and characterization of this distribution for instance Garg, Rao, and Redmond (1970) studied on the properties of the Gompertz distribution and compare the estimates through the least-squares method and maximum-likelihood estimation. Gordon (1990) considered on the maximum likelihood estimates for the parameters of the mixture of two Gompertz distributions when censoring occurs. Subsequently, Witten and Satzer (1992) addressed the issue of parameter sensitivity of a new method for estimating the model parameters of the Gompertz mortality rate model. Wilson (1994) compared the Gompertz, Weibull and Logistics functions in the analysis of mortality data. Chen (1997) developed an exact confidence interval and an exact joint confidence region for the parameters of the Gompertz distribution. While, Wu, Hung, and Tsai (2004) proposed unweighted and weighted least squares estimates for parameters of the Gompertz distribution under complete set of data and first failure censored data. Lenart (2012) discussed on the comparison method of moments and maximum likelihood estimates from a Gompertz distribution. Kiani, Arasan, and Midi (2012) deliberated on performance of the Gompertz model with time-dependent covariate in the presence of right censored data and applied two confidence interval estimation techniques known as Wald and Jackknife. Kiani and Arasan (2013) was extended the Gompertz model to incorporate time-dependent covariate in the presence of interval-, right-, left-censored and uncensored data. Abu-Zinadah (2014) implemented the maximum likelihood method of estimation for estimating the parameters and performed the goodness-of-fit tests for testing the three-parameters exponentiated Gompertz distribution based on complete and type II censored sampling. Later, El-Din, Abdel-Aty, and Abu-Moussa (2016) estimated the parameters for Gompertz distribution by finding the maximum likelihood method and Bayesian method under three different loss functions. Currently, de Andrade, Chakraborty, Handique, and Gomes-Silva (2019) studied five-parameter model based on a new generalization of the extended Gompertz distribution known as exponentiated generalized extended Gompertz distribution. Weissfeld and Schneider (1990) conferred on the methods for detecting influential observations for the Weibull model fit to censored data which include the methods of one-step deletion diagnostics, influence functions and curvature diagnostics. Results from Leung, Elashoff, and Afifi (1997) summarised that various methods used to deal with censored data which includes complete data analysis, the imputation techniques, analysis based on dichotomized data and the likelihood-based approach. Farrington (2000) had applied several diagnostic tools such as Cox-Snell, Lagakos (or martingale), deviance, and Schoenfeld residuals for use with proportional hazards models for interval-censored survival data. Sparling, Younes, Lachin, and Bautista (2006) were presented a parametric family of regression models for interval-censored eventtime (survival) data that accommodates both fixed and time-dependent covariates. Prinja, Gupta, and Verma (2010) devised that problem of interval censoring arises when time to event may be known only up to a time interval which the situation occurs in a case where the assessment of monitoring is done at a periodical frequency. Kiani and Arasan (2018) discussed on the survival model with doubly interval censored data and time dependent covariate where the life-time is the elapsed time between two related events which means that the first event and the second event are interval censored. Sakurai and Hattori (2018) developed a modelchecking procedure based on the cumulative martingale residuals for the interval-censored observations.

Gompertz model with right and interval censored data and covariate
Let T be a non-negative continuous random variable which denotes the survival time. The probability density function of the Gompertz is given by, (1) The corresponding survivor function is given by, The hazard function is The effect of covariates on survival time can be incorporated to the hazard function by letting the parameter λ be a function of the covariates, For data set with a covariate x i where i = 1, 2, ..., n, the hazard function for i th subject can be expressed as, where Therefore, the hazard function is The probability density function is with the corresponding survivor function given by The parameters of this model can be estimated by the method of maximum likelihood (MLE) where θ = (β 0 , β 1 , γ) is the vector of parameters.

Maximum likelihood estimation
To incorporate right and interval censored data to the likelihood function, we need to define the following indicator variables for i th observation, Then the likelihood function for the full sample consisting of complete, right censored and interval censored data is, and log-likelihood function is, The inverse of the observed information matrix, [i(β 0 ,β 1 ,γ)] −1 can be obtained from the second partial derivative of the log-likelihood function evaluated atβ 0 ,β 1 andγ which provides the estimates for the variance and covariance.

Simulation study and results
3.1. Assessing performance of the parameter estimates A simulation study using 1000 samples each with n=30,40,50,80 and 100 were conducted for this model for both censored and uncensored observations with fixed covariates, x i . The covariate values were simulated independently from the standard normal distribution. The values of -5, 0.3 and 0.5 were chosen as the parameters of β 0 , β 1 and γ to mimic real life survival data. A sequence of random numbers, u i , from the standard uniform distribution on the interval (0, 1) was generated to produce lifetimes t i for i = 1, 2, ..., n subjects. The censoring time, c i was generated from exponential distribution where the value µ would be adjusted to obtain the desired approximate censoring proportion (cp) for the data with cp = 0%, 10%, 20%, 30%, 40% and 50%. The simulated survival time is considered censored if t i > c i , and will be replaced by the corresponding censoring time. The survival time t i was generated by, In order to evaluate the performance of the estimator at different combination of sample sizes and censoring proportions, the bias, standard error (SE) and root mean square error (RMSE) of the parameter were calculated. The bias, SE and RMSE were computed by, RM SE = SE 2 + bias 2  Table 3 indicates that the root mean square error values also increase as the censoring proportion increase. This indicates that poorer performance for the parameter estimates at smaller sample sizes and higher censoring proportions, whereas larger sample sizes and lower censoring proportions would have higher accuracy and efficiency of the parameter estimates.

Modification of Cox-Snell residuals
Cox-Snell residuals, r Ci , is widely used in the analysis of survival data as discussed by Cox and Snell (1968) to assess a model's goodness-of-fit. A log-cumulative hazard plot of residuals is obtained by plotting the Cox-Snell residual against the cumulative hazard function to assess the model's fit. A well fitting model will exhibit a linear line through the origin with a unit gradient. It should be noted that it will take a particularly ill-fitting model for the Cox-Snell residuals to deviate significantly from this. One criticism of Cox-Snell residuals is that they do not account for censored observations, therefore the adjusted Cox-Snell residuals were devised by Crowley & Hu (1977) in Collett (2003) whereby the standard Cox-Snell residual, r Ci could be used for uncensored observations and r Ci + ∆ whereby ∆ = log(2) = 0.693, is used to adjust the residual. The Cox-Snell residuals for the i th individual, i = 1, 2, ..., n is given by, The modified Cox-Snell residuals by proposed to account for censored data. Crowley & Hu (1977) in Collett (2003) found that the addition of unity to a Cox-Snell residual for a censored observation inflated the residual to too great an extent. Thus, by using the median value of the excess residual, a second version of the modified Cox-Snell residual is, r ci = r ci , for uncensored observations, r ci + 0.693, for censored observations.
In this research we propose two modifications to the Cox-Snell residuals as follows r 2 ci = r ci , for uncensored observations, r ci + g, for censored observations. and r 3 ci = r ci , for uncensored observations, r ci + h, for censored observations.
where g is the geometric mean of data and h is the harmonic mean of data.

Simulation study
A simulation using 1000 samples each with n=30,40,50,80 and 100 using cp = 0%, 10%, 20%, 30%, 40% and 50% was conducted to compare the residual values. A number of plots based on residuals were used in the graphical assessment of the adequacy of a fitted model. Plot of ln[−ln(Ŝ(r ci ))] vs ln(r ci ) should be a straight line through the origin with unit slope if the data fits the model. Several modification of the Cox-Snell residuals were used and compare the performance for censored and uncensored data. The table indicates that as number of sample size increase, the intercept become closer to zero. Similarly as sample size increase, the slope become closer to 1. While for both r and R 2 values closer to 1 indicates a strong relationship. However, when the censoring proportions becomes higher, expected values for intercept and slope go further than zero and 1 respectively. The range for r and R 2 values also become wider as it across higher censoring proportions.

Conclusion
Based on the bias, standard error and root mean square error, we can conclude that poorer performance for the parameter estimates at higher censoring proportions and smaller sample sizes. This indicates that the estimates perform better when sample sizes are larger and censoring proportions are lower.
We can conclude that higher number of sample size make intercept closer to zero and slope closer to one. While for the range for r and R2 values becomes wider as the censoring proportion increases. Based on results, we can see that proposed modification of the Cox Snell residual using geometric mean perform slightly better than the existing methods.