Generalized Topp-Leone-Weibull AFT Modelling: A Bayesian Analysis with MCMC Tools using R and Stan

The generalized Topp-Leone-Weibull (GTL-W) distribution is a generalization of Weibull distribution which is obtained by using generalized Topp-Leone (GTL) distribution as a generator and considering Weibull distribution as a baseline distribution. Weibull distribution is a widely used survival model that has monotoneincreasing or decreasing hazard. But it cannot accommodate bathtub shaped and unimodal shaped hazards. As a survival model, GTL-W distribution is more flexible than the Weibull distribution to accommodate different types of hazards. The present study aims at fitting GTL-W model as an accelerated failure time (AFT) model to censored survival data under Bayesian setting using R and Stan languages. The GTL-W AFT model is compared with its sub-model and the baseline model. The Bayesian model selection criteria LOOIC and WAIC are applied to select the best model.


Introduction
introduced a new probability distribution having one parameter on the support (0, 1). Nadarajah and Kotz (2003) discussed this distribution and derived many relevant functions and quantities, for instance, characteristic function, moments. The cumulative distribution function (cdf) and probability density function (pdf) of a random variable X having Topp-Leone (TL) distribution with shape parameter a > 0 and 0 < x < 1 are given as follows: Since the support of TL distribution is (0, 1), so it can not be used in many areas including in survival analysis. Recently, Sangsanit and Bodhisuwan (2016) and Al-Shomrani, Arif, Shawky, Hanif, and Shahbaz (2016) extended TL distribution as the Topp-Leone generated (TL-G) family of distributions. Suppose, G(x) = G(x|ξ), and g(x) = g(x|ξ) be the baseline cdf and pdf of a continuous random variable X. If TL distribution is considered as a generator then the TL-G family of distributions is obtained by replacing x with G(x) in the TL distribution and is given as below: where f (x) = d dx F (x), g(x) = d dx G(x), support of F (x|a, ξ) is the support of G(x) and ξ is the vector of parameters of baseline distribution G(x) and we can write X ∼ TL-G (a, ξ). Further generalization of TL-G model is considered in the next section. Rezaei, Sadr, Alizadeh, and Nadarajah (2017) generalized the TL-G(a, ξ) distribution by exponentiating the baseline cdf G(x) with an additional shape parameter b > 0. Mahdavi (2017) made generalization of TL-G model with different consideration. Motivated by Cordeiro and de Castro (2011) , in this article, the generalization of TL-G family due to Rezaei et al. (2017) is called as the generalized Topp-Leone-G (GTL-G) family of distributions. So the GTL distribution is considered as the generator of the new family, GTL-G model. The cdf, survival function (sf) and pdf of GTL-G model can be written as:

Generalized Topp-Leone-G model
where, shape1 a > 0, shape2 b > 0, and ξ is the vector of parameters of baseline distribution G(x), support of F (x|a, b, ξ) is the support of G(x) and we can write X ∼ TL-G (a, ξ). The GTL-G model will be tractable if the baseline cdf G(x) and pdf g(x) have simple analytic forms. Considering the cdf of Weibull distribution G(x) as the baseline distribution, the GTL-W distribution is derived and discussed as a survival model with Bayesian perspective.

Random number generation from GTL-G model:
Suppose that U ∼ Uniform(0, 1). Then This is the general expression for generating random numbers from GTL-G model for any baseline cdf G(x).

Weibull survival model
Suppose, survival time T follows Weibull distribution with shape α(> 0) and scale parameter λ(> 0). Then the pdf g(t), cdf G(t), survival function (sf)Ḡ(t) and hazard function (hf) h(t) of Weibull distribution are given as below (Lawless 2003): g(t|α, λ) = (α/λ)(t/λ) (α−1) exp[−(t/λ) α ], t > 0 h(t|α, λ) = (α/λ)(t/λ) (α−1) and we write T∼ Weibull(α, λ). The Weibull distribution is one of the basic lifetime distributions used in survival analysis. The hazard function of Weibull distribution is monotoneincreasing or decreasing. Moreover, its survival and hazard functions can be expressed explicitly and as such, it can handle censored data easily and effectively. Because of flexibility and tractability of hazard and survival functions, Weibull model is widely used in modelling reliability and survival data. But the distribution is not suitable for analyzing data having bathtub shaped and unimodal shaped hazard rates. So to capture different types of hazards, more flexible generalization of the Weibull model is essential in fitting skewed survival data.

Weibull AFT model
The accelerated failure time (AFT) model considers linear regression of logarithm of survival time T on a number of covariates and is used for studying the effect of a covariate that accelerates or decelerates the survival process. The AFT models are parametric models as they depend on the distribution of survival times. The AFT model postulates a direct relationship between failure time and covariates (Kalbfleisch and Prentice 2002). The models are becoming popular as they directly explain reduction or prolongation of the time to event. If survival time T depends on a number of covariates x 1 , x 2 , . . . , x p then the AFT model can be written as: where, β j , j = 0, 1, 2, . . . , p are the regression coefficients, σ(> 0) is a scale parameter, and random error Z has a specified probability distribution. For instance, if Z has an extreme value distribution then T has Weibull distribution (Lee and Wang 2013;Collett 2015).
Suppose that a random variable Z has a standard extreme value distribution with density function g(z) = exp[z − exp(z)] and survival function S(z) = exp [−exp(z)]. Substituting z = (logt − x β)/σ in the extreme value distribution, Weibull AFT model, T ∼ Weibull( 1 σ , exp(x β)), is obtained as follows: Considering the Weibull AFT model as the baseline model, GTL-W AFT model is developed. Aryal, Ortega, Hamedani, and Yousof (2017) introduced Topp-Leone generated Weibull distribution and illustrated with real data sets using maximum likelihood method of estimation and showed that the model fit the data better than the other competitive models. GTL distribution is considered as a generator of the new family of models and Weibull AFT model is considered as the baseline model G. The generalized Topp-Leone-Weibull (GTL-W) AFT model is obtained by substituting Weibull AFT model in the GTL-G model. The cdf, sf, pdf and hf of the GTL-W AFT model can be written as follows:

Description of data
Survival times of 80 patients with diffuse histiocytic lymphoma according to stage of tumourstage III and stage IV are given in Table 1. The death times are recorded in days ( indicates censored time). Vidakovic (2011) and Armitage, Berry, and Matthews (2002) discussed the data from different aspects and the data was first reported by McKelvey, Gottlieb, Wilson, Haut, Talley, Stephens, Lane, Gamble, Jones, Grozea et al. (1976). The aim of this study is to fit GTL-W AFT model for comparing the survival patterns of two groups of patients under Bayesian framework and the fitted model is compared with its sub-model and baseline Weibull AFT model on the basis of LOOIC and WAIC.

Bayesian analysis of GTL-W AFT model
The basic assumption of Bayesian statistics is that parameters θ in a probability model are random variables having prior distributions π(θ) while the observed data drawn from a sampling density are considered as fixed. Bayes Theorem is the main instrument of Bayesian statistics that establishes the relationship among the prior distributions, likelihood and the posterior distribution.

Likelihood function of GTL-W AFT model
The likelihood function of GTL-W AFT model for a right censored sample is derived as (Collett 2015;Lawless 2003;Wang, Ryan, and Faraway 2018): where, S(t i |a, b, σ, β, x), f (t i |a, b, σ, β, x) and h(t i |a, b, σ, β, x) of GTL-W AFT model are given in Equation (12,13,14), θ = (a, b, σ, β), D = (t, δ, X), the censoring indicator δ = 0 if the observation is censored and δ = 1 if the observation is failed, and X is the matrix of covariates which is known as design matrix or model matrix. The contribution of data to the posterior distribution is ensured through the likelihood.
Taking logarithm of both sides of likelihood function, the log-likelihood can be written by the following two alternative equations,

Prior distributions of GTL-W AFT model parameters
Selecting prior distributions of the parameters of a probability model of data is very important as they might have significant influence on the posterior distribution of the parameters. GTL-W density has two shape parameter(a, b), a positive scale parameter (σ) and regression coefficients (β). So prior distributions for the parameters θ = (a, b, σ, β) are required to be selected. For scale parameter σ, half-Cauchy distribution and for the regression coefficients β, normal distribution are considered as weakly informative prior distribution (Gelman et al. 2006). But Cauchy sampling process may not converge to a stable mean and variance (McElreath 2015). So in this article we consider half-t distribution (Wiper, Girón, and Pewsey 2008) as prior for the scale parameter and for the shape parameters as well. Half-t distribution with one degrees of freedom is equivalent to half-Cauchy distribution. The hyperparameters of the prior distributions are chosen so that priors are almost flat (Figure 1), as such, the data dominate the posterior through the likelihood.

Posterior distribution of GTL-W AFT model parameters
The joint posterior distribution of the parameters θ = (a, b, σ, β) = (a, b, σ, β 0 , β 1 , . . . , β p ) of GTL-W AFT model given the data can be written following Bayes Theorem as: where the likelihood L(θ|D) is given by Equation (18), parameters are assumed to be independent.
The normalized joint posterior distribution and the marginal distributions of the parameters are difficult to obtain analytically that requires solving high dimensional integration and, as such, Markov chain Monte Carlo (MCMC) simulation technique is employed to get the estimates and other relevant results.

Rationale of using Bayesian methods to survival analysis
Bayesian methods are becoming increasingly popular among the survival analysis researchers. Ibrahim, Chen, and Sinha (2001) and Li and Ma (2013) highlighted some reasons of applying Bayesian methods to survival analysis. As the survival time is non-negative, usually positively skewed and censored, so the least squares method is not applicable for analyzing survival data. The likelihood function is the most important way of handling survival data that considers each of the censored and uncensored observations. But the likelihood function for censored data is complicated and different from likelihood function for a sample of complete observations. The maximum likelihood method of estimation can be applied and sampling distribution can be derived for inference procedures only for simple distributions for some censoring schemes, not for all types of censoring. Large sample asymptotic sampling distribution is difficult to derive for censored data, even that is not applicable for small sized censored sample. Bayesian methods are very useful in survival analysis as it can analyze censored survival data effectively and efficiently irrespective of sample size and can handle complex models with any number of parameters using MCMC algorithms.

Stan code for fitting Bayesian GTL-W AFT model
Stan is a probabilistic programming language for Bayesian analysis. It uses no-U-turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo (HMC) sampling that is more efficient than other Metropolis-Hastings algorithms, specially for high-dimensional models regardless of whether the priors are conjugate or not (Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt, Brubaker, Guo, Li, and Riddell 2017;Hoffman and Gelman 2014). It is easier to specify the constraints of the parameters in Stan. A prior can be specified in Stan, but if prior is not specified, Stan uses uniform prior for each parameter. Bayesian softwares BUGS or JUGS is flexible enough to specify a large variety of models, but for large data sets and complex models they might take too long computing time or even may fail to provide solutions. Stan can handle large data sets and complex models in less computing time than BUGS. The number of iterations needed to achieve convergence is lower using Stan. Stan does not allow discrete parameters. As Stan is a recently developed software, not much literature is available on it. Gelman et al. (2013), Kruschke (2015), Korner-Nievergelt, Roth, Von Felten, Guélat, Almasi, and Korner-Nievergelt (2015) and Turkman, Paulino, and Müller (2019)  Data structure for Stan to fit GTL-W model: Stan requires a data list that might include a matrix, vector and values. So, data must be prepared to feed into Stan. An object list is constructed using R and assigned it to dat1 (say).

Convergence of MCMC algorithm for GTL-W AFT model:
An MCMC algorithm simulates samples from the complex posterior distribution. So an algorithm is converged to the target posterior distribution means that the Markov chain is stationary and adding more samples will not change the location and shape of the density of the posterior distribution sensibly and hence will not change the estimates and other relevant results. Quantitatively, potential scale reduction factor, R (Gelman and Rubin 1992), is used for checking convergence of MCMC algorithms which is defined based on between chain variance and within chain variance. Gelman et al. (2013) recommended acceptable limit of potential scale reduction factorR < 1.1 and effective sample size as n ef f = 100. From the summary features (Table 2), it is evident thatR is less than 1.1, n ef f is greater than 100; moreover, Monte Carlo (MC) error, se mean is less relative to the standard deviations for all of the parameters which is expected, that is, convergence of MCMC algorithm to the posterior distribution has been attained. Graphically, convergence of an MCMC algorithm can be judged, mainly, by trace plot (Gelman et al. 2013;Ntzoufras 2009;Hamra, MacLehose, and Richardson 2013). Trace plot (Figure 2) shows no periodicity indicating convergence of the MCMC sampling process to the joint posterior distribution. That is, MCMC algorithm HMC-NUTS performs correctly to explore the target posterior distribution.

Evaluating Bayesian fitting of GTL-W AFT model:
A fitted Bayesian model is accepted as adequate if it predicts the future observations that are consistent with the present data. Model fit is assessed visually by posterior predictive density (PPD) plots which are made using bayesplot package. From posterior predictive density plots (Figure 3), it is observed that GTL-W AFT model is compatible with the data.
Interpretation of results of fitted GTL-W AFT model: Summary features of the posterior distribution of the parameters of GTL-W AFT model are shown in  will decrease. Estimated value of the coefficient β[2] = −1.046 belongs to the 95% credible interval (−2.000, −0.121) which does not include zero value indicates statistical significance. Moreover, it is seen from the caterpillar plot ( Figure 4) that zero value is not contained in the 95% credible interval for the coefficient β[2], so the coefficient is statistically significant. The acceleration factor is exp(β 2 ) = exp(−1.046) = 0.35(< 1) for a patient with stage IV tumour. Therefore, duration of life of a patient with stage IV tumour would be 0.35 times shorter than that of a patient having tumour of stage III , that is, survival time would be shorter for the patients with stage IV tumour.

Sub-models of GTL-W AFT model
If b = 1 then GTL-W (a, b, 1/σ, exp(x β)) AFT model reduces to TL-W (a, 1/σ, exp(x β)) AFT model, where, is the cdf of baseline Weibull(1/σ, exp(x β)) AFT model. So TL-W AFT model and the baseline Weibull AFT model are analyzed and compared with GTL-W AFT model using Bayesian approach.

Bayesian analysis of sub-models of GTL-W AFT model
Using the same data and same prior distributions the sub-model TL-W AFT and the baseline Weibull AFT model are fitted with little manipulations of Stan code. Necessary results and graphs are shown for comparison.
Fitting TL-W AFT model with Stan and summarizing output: Stanfit object M2 (say) is created for TL-W AFT model. The Stan codes for TL-W AFT model are given in the appendix and the whole code block is saved as stancode_gtlw.
Summary results are reported in Table 3. Posterior predictive density plot are shown in Figure 5. LOOIC and WAIC are reported in Table 5.   Table 4. Posterior predictive density plot are shown in Figure 6. LOOIC and WAIC are reported in Table 5.

Model comparison
Selecting the best model from among the several competitive models is always crucial in Bayesian statistics and in classical statistics as well. Two information criteria Leaveone-out cross validation (LOO) and Widely Applicable or Watanabe Akaike Information Criterion (WAIC) (Vehtari, Gelman, and Gabry 2017;McElreath 2015;Gelman et al. 2013) are usually used to compare the Bayesian fitted models. Pointwise log-likelihoods are calculated in the generated quantities block of Stan program and afterwards 'loopackage' (Vehtari, Gelman, Gabry, and Yao 2018) extracts and uses these quantities to obtain numerical measures LOOIC (LOO information criterion) or WAIC for model comparison and they are presented in the Table 5. A better fitted model has less LOOIC or WAIC than the others. On the basis of LOOIC and WAIC, it is evident that the AFT models TL-W and GTL-W are better fitted models than the baseline Weibull AFT model; TL-W and GTL-W AFT models are almost indistinguishable and might be equally acceptable for analyzing the tumour stage data. As generalizations, GTL-W has 4 parameters, TL-W model has 3 parameters whereas Weibull distribution has 2 parameters. The well-known Occam's principle of parsimony is, " If two models describe the observations equally well, we should choose the simpler one." (Lesaffre and Lawson 2012). Accordingly, TL-W AFT model is the most preferred model.

Conclusion
The generalized Topp-Leone-Weibull (GTL-W), Topp-Leone-Weibull (TL-W) and Weibull survival models are fitted as accelerated failure time models under Bayesian framework to the tumour data. For all the models, the coefficient of tumour stage is statistically significant. Comparing posterior predictive density plots, LOOIC, WAIC and the number of parameters of the models, it can be concluded that the TL-W AFT model is the best suited model for fitting the censored tumour data. The death is accelerated as the acceleration factor is less than one for patients having stage IV tumour, that is, survival time is shorter for the patients with stage IV tumour than that of the patients with stage III tumour (Figure 7).