The Modelling of a Cure Fraction in Bivariate Time-to-Event Data

Three correlated frailty models are used to analyze bivariate timeto-event data by assuming gamma, log-normal and compound Poisson distributed frailty. All approaches allow to deal with right censored lifetime data and account for heterogeneity as well as for a non-susceptible (cure) fraction in the study population. In the gamma and compound Poisson model traditional ML estimation methods are used, whereas in the log-normal model MCMC methods are applied. Breast cancer incidence data of Swedish twin pairs illustrate the practical relevance of the models, which are used to estimate the size of the susceptible fraction and the correlation between the frailties of the twin partners. We discuss future directions of development of the methods and additional thoughts concerning their advantages and use. Zusammenfassung: Drei korrelierte Frailty-Modelle werden benutzt um bivariate Lebensdauerdaten zu analysieren. Dabei werden die Gamma, LogNormal und compound Poisson Verteilung für die Frailty-Variable angenommen. Alle Modelle sind auf rechts zensierte Daten anwendbar und erlauben die Modellierung einer Subpopulation, die dem interessierenden Ereignis gegenüber geschützt ist. Im Gamma und compound Poisson Modell werden traditionelle ML Schätzungen verwendet, während im Log-Normal Modell MCMC Methoden angewendet werden. Daten über Brustkrebs bei schwedischen Zwillingen illustrieren die praktische Anwendbarkeit der Modelle, die insbesondere genutzt werden, um die Größe der geschützten Subpopulation und die Korrelationen zwischen den Frailties der Zwillingspartner zu schätzen. Vorteile, Nachteile und offene Fragen der Forschung im Bereich dieser Modelle werden diskutiert.


Introduction
The Cox proportional hazards model (Cox, 1972) is commonly used in the analysis of survival time data.An often unstated assumption of the proportional hazards model and of traditional frailty models (with the exception of those that use the compound Poisson distribution (Aalen, 1988, Aalen, 1992)) is that all individuals will experience the event of interest.However, in some situations a fraction of individuals is not expected to experience the event of interest; that is, these individuals are not at risk.The terminology to describe the never-at-risk group varies from field to field, but includes 'long-term survivors' or 'cured' in epidemiology, 'non-susceptibles' in toxicology, 'stayers' in finite Markov transition models of occupational mobility, the 'non-fecundable' in fertility models, and 'non-recidivists' among convicted criminals.In epidemiology and medicine, researchers may be interested in analyzing the occurrence of a disease.Many individuals may never experience that disease; therefore, there exists a fraction in the population that is protected.Cure models are survival models which allow for a cured fraction in the study population.
These models extend the understanding of time-to-event data by allowing for the formulation of more accurate and informative conclusions than previously made.These conclusions would otherwise be unobtainable from an analysis that fails to account for a cured fraction in the population.If a cured component is not present, the analysis reduces to standard approaches of survival analysis.Most cure models assume that the susceptible individuals are homogeneous in risk.This paper deals with cure models which include a frailty variable in order to allow for heterogeneity among the fraction under risk.Or, depending on the point of view, this paper deals with frailty models including a nonsusceptible (or cured) fraction in the study population.In this case, the distribution of the frailty is a combination of discrete and continuous distributions.
In cure models, the population is divided into two sub-populations so that an individual is either cured with probability 1 − φ, or has a proper survival function S(t), with probability φ.Here, proper survival function means lim t→∞ S(t) = 0. Individuals regarded as cured will never experience the event of interest and their survival time will be defined as infinity.Therefore, the hazard and survival functions of cured individuals are set to zero and one, respectively, for all finite values of t.A univariate time-to-event model that incorporates a cure fraction is given by Longini and Halloran (1996) have proposed frailty cure models that extend standard frailty models.The frailty random variable in the former has point mass at zero with probability 1 − φ while heterogeneity among those experiencing the event of interest is modelled via a continuous distribution with probability φ.Price and Manatunga (2001) gave an excellent introduction to this area and applied leukaemia remission data to different cure, frailty and frailty cure models.They found that frailty models are useful in modelling data with a cured fraction and that the gamma frailty cure model provides a better fit to their remission data compared to the standard cure model.
The following example provides an extension of the above model to include censored observations.Consider two types of expressions for a disease, the incidence and the age of disease onset.Risk models for overall susceptibility (lifetime risk) that consider only the first expression by treating the disease as a binary trait of being affected or not can give wrong results because, for individuals without the disease, due to censoring, it is often not known whether they will eventually develop the disease.On the other hand, models from survival analysis typically assume that everyone has the same susceptibility to the disease and will eventually be effected if the follow-up is sufficiently long.These models possibly do not accurately describe the disease risk factors.In models dealing with both types of expressions, the effect of a covariate can act on either the overall susceptibility or the age at onset or both.In general, cure models are special cases of the binary (two-point) frailty model.Chatterjee and Shih (2001) considered an extension of such univariate frailty cure models to a bivariate setting.They used three different copulas in their two-step analysis procedure.Wienke et al. (2003b) suggested the use of a correlated gamma frailty model (Yashin and Iachine, 1995;Yashin et al., 1995;Wienke et al., 2003a).
The main task of the present paper is the estimation of the size of the susceptible fraction with respect to breast cancer by comparing the results of the correlated gamma cure, the correlated log-normal cure and the correlated compound Poisson frailty model.
In the next section we describe the proposed models, then provide an application of the models to breast cancer data from the Swedish Twin Registry in section three.The paper concludes with a discussion of further applications, drawbacks and advantages of the models.

Correlated Gamma Frailty Cure Model
A bivariate shared frailty cure model for familiar association in diseases was established by Chatterjee and Shih (2001).We use their notation here and define for a pair of individuals and let T j denote the age at onset for the j-th individual when V j = 1 (j = 1, 2).In terms of hazard rates, the model is given by the relation with Z j denoting the frailty of the j-th individual in the pair.Chatterjee and Shih (2001) used three different copulas in their approach: the shared gamma frailty model (Clayton, 1978), Frank's copula and Hougaard's shared positive stable frailty model (Hougaard, 1986).A two step estimation procedure was applied.Their model was partly extended by Wienke et al. (2003b), who substituted the shared gamma frailty model by the correlated gamma frailty model.Furthermore, all parameters were estimated in a one-step maximum likelihood procedure.The form of the baseline hazard is important because all methods described below are parametric.In principle, any parametric formula for a hazard rate is possible (e.g., Gompertz, Gompertz-Makeham, Weibull, exponential, piecewise constant).A vast literature on human mortality and time to onset of specific diseases suggests using the Gompertz hazard rate.For that reason and to save space, we investigate only bivariate frailty models that have the Gompertz baseline hazard rate.Let The bivariate survival function S(t 1 , t 2 ) is the conditional survival function of pairs with both individuals susceptible, e.g.given the event {V 1 = 1, V 2 = 1}.The likelihood function of bivariate right censored lifetime data in this model is given by The form of S(t 1 , t 2 ) depends on the frailty distribution.The correlated gamma frailty model (Pickles et al., 1994;Yashin and Iachine, 1995;Yashin et al., 1995) is developed for the analysis of multivariate failure time data, in which two associated random variables are used to characterize the frailty effect for each cluster.To be more specific, let k 0 , k 1 be some real positive variables.Set are the frailties of individual 1 and 2 in a pair.The bivariate survival function of this model is given by where S(t) denotes the marginal univariate survival function, assumed to be equal for both twin partners and 0 ≤ ρ ≤ 1 holds.Furthermore, it holds that ρ = corr(Z 1 , Z 2 ), E(Z j ) = 1 and σ 2 = var(Z j ).Obviously, the shared gamma frailty model by Clayton (1978) is a special case of (4) when ρ = 1.

Correlated Log-Normal Frailty Cure Model
The correlated log-normal model is much more flexible than the correlated gamma one, because it is not based on the additive composition of the two frailties as used in ( 2) and (3).However, the log-normal distribution does not allow an explicit representation of the likelihood function, which requires more sophisticated estimation strategies.We assume that the two frailties of individuals in a pair are given by where LogN denotes the (bivariate) log-normal distribution.Here, m, s 2 , and r denote the mean, variance and correlation of the respective normal distribution.This distribution can be obtained by assuming a bivariate normal distribution on the logarithm of the frailty vector and N denotes the bivariate normal distribution whose parameters are some functions of the frailty parameters µ, σ 2 , and ρ (see, e.g., Hutchinson and Lai, 1991) For reasons of identifiability of the model let µ = 1.It follows from ( 7) to ( 9) that There exists no closed form expression for the bivariate survival function S(t 1 , t 2 ).

Correlated Compound Poisson Frailty Model
The third bivariate model which we consider here is the correlated compound Poisson frailty model.The univariate compound Poisson distribution was introduced by Aalen (1988Aalen ( , 1992) ) as a frailty distribution.An interesting property of the model is that it yields a subgroup of individuals with zero frailty, who survive forever.Despite the fact that the density of the continuous part is only given as an infinite series which has to be calculated numerically, the distribution is mathematically convenient.It may also be seen as a natural choice.The distribution can be constructed as the sum of a Poisson distributed number of independent and identical gamma distributed random variables.The Laplace transform of the compound Poisson distribution is given by The notation cP(γ, k, λ) is used for a compound Poisson distribution.Expectation and variance of a compound Poisson distributed random variable Z are The Laplace transform given above implies the marginal survival and hazard function in case of a compound Poisson frailty model Using the constraint E(Z) = 1 and relationship σ 2 = (1 − γ)/λ (see ( 10)) it holds Austrian Journal of Statistics, Vol. 35 (2006), No. 1, 67-76 It should be noted that the integral of λ(t) over [0, ∞) is finite when γ < 0. Consequently, the survival function is incomplete because a fraction of individuals have zero frailty, who could never experience the event under study.
To introduce the bivariate model, let Y 0 , Y 1 , Y 2 be independent compound Poisson distributed random variables with are the frailties of individual 1 and 2 in a pair.The bivariate survival function of this model is given by where S(t) denotes the marginal univariate survival function, assumed to be equal for both twin partners and 0 ≤ ρ ≤ 1 holds.Furthermore, it holds that ρ = corr(Z 1 , Z 2 ) and σ 2 = var(Z j ).
Parameter γ divides the class of distributions in two major subfamilies.For γ ≥ 0, the distribution is a power variance function distribution (PVF) or three parameter distribution.This bivariate correlated frailty model was proposed by Yashin and Iachine (1995) and applied to Danish twin mortality data.The inverse Gaussian model is included as a special case by γ = 1/2.The case γ > 0 is not considered further here, because we expect parameter estimates γ < 0 in our real data example.
The extension to γ < 0 in the univariate case was suggested by Aalen (1988Aalen ( , 1992) and shown to yield the compound Poisson distribution.Parameter values of γ < 0 imply the existence of a non-susceptible fraction in the population.The extended bivariate model is applied to breast cancer data of Swedish twins to estimate the size of the susceptible fraction and results are compared to the analysis by the correlated gamma frailty cure model.

Example
The data set from the Swedish Twin Register contains records of 5857 female twin pairs with both partners being alive in 1959-61.Individuals were followed up from 1959/61 to 27 October 2000.Altogether, we have 2003 monozygotic and 3854 dizygotic twin pairs, and 715 cases of breast cancer were identified during the follow-up.More detailed information about the Swedish Twin Register can be found in Lichtenstein et al. (2002).The results are given in Table 1.We consider the three different correlated frailty cure models described in the last section.In the first and second model the susceptible status of the individuals in a pair is assumed to be independent of each other, e.g., with p 1 , p 2 ∈ {0, 1}.The size of the susceptible fraction is uniquely described by the probability φ = P({V The case of dependence between the susceptible status of the twin partners was already considered in Wienke et al. (2003b) analyzing the same data.
In the first model gamma distributed frailty is used to account for heterogeneity in the population and a cure fraction is included into the model.Parameter σ 2 is a measure of heterogeneity, which is large in this population, and φ as the size of the susceptible fraction, which is found to be around 20%.The gamma frailty model without cured fraction is a special case of the compound Poisson frailty model with γ = 0.A MCMC approach was used to estimate the model parameters.More details about this case can be found in Locatelli et al. (2004).
A log-normal distributed frailty is used and a cured fraction is included in the second model.Parameter estimates are different from that in the gamma case with higher heterogeneity (σ 2 = 4.865) and a smaller susceptible fraction (φ = 0.102).
In the compound Poisson model (third column) the parameter estimates are larger than in the gamma and log-normal model (φ = 0.336, σ 2 = 7.034).

Discussion
In the present paper three different models are applied to the Swedish breast cancer data set.In all three bivariate models the association between the lifetimes of the twins is accounted for.Furthermore, it is possible to estimate the correlation between the frailties of the twin partners, which is large in the correlated gamma frailty cure and the correlated log-normal frailty cure model.It should be noted, that in the gamma and log-normal model this correlation is estimated from pairs only, where both partners are susceptible, which is the reason for the high correlations.In the correlated compound Poisson frailty model the frailty variable is a mixture of a continuous and a discrete part.Consequently, in this model the frailty Z plays the role of V Z in the gamma and the log-normal model.That is why the correlation in the compound Poisson model is much lower, because it is the correlation in all pairs including non-susceptible individuals.This problem is already discussed by Chatterjee and Shih (2001) in detail.The correlated gamma and log-normal frailty cure model incorporate two types of association, one between the lifetime risk or the overall susceptibility of two individuals (corr(V 1 , V 2 )) and one between the frailties of onset between susceptible individuals (corr(Z 1 , Z 2 )).In the correlated compound Poisson frailty model these two types of correlation are combined and can not be separated.In all models correlations in monozygotic pairs are higher than in dizygotic pairs, but the differences are small.This is in line with the well known fact that the influence of genetic factors on susceptibility to breast cancer is small (5 -10 %).
The estimates of the size of a susceptible fraction (due to breast cancer) are close to the estimate φ = 0.22 (0.0093) in the parametric model found by Chatterjee and Shih (2001) in a study population that is completely different.
Interestingly, parameter estimates are quite different depending on whether the frailty distribution among the susceptible individuals is chosen as gamma, log-normal or compound Poisson.In all considered cases only a relatively small fraction (10% -34%) of all women is indicated to be susceptible to breast cancer.Nevertheless, the estimates of the susceptible fraction in all three models in Table 1 are close to the figures obtained by Farewell (1977) for different combinations of four risk factors.The authors found that, if none of the risk factors is present the susceptible fraction is around 0.015, if all risk factors are present, the estimate increases to 0.272.
The size of the fraction of woman susceptible to breast cancer is given by the model parameter φ in the gamma and log-normal frailty cure model.This parameter does not exist in the compound Poisson frailty model.Here in this model the size of the susceptible fraction can be calculated by φ = 1 − exp((1 − γ)/(γσ 2 )).The estimates for the size of the susceptible fraction have to be compared with the overall lifetime risk of breast cancer, which is around 8% -12% in current western populations (Harris et al., 1992;Feuer et al., 1993;Rosenthal and Puck, 1999;Ries et al., 1999).The twin population considered here was born between 1886 and 1925 when the lifetime risk for breast cancer was lower because of competing causes of death like infections, which are much less important today.
The newly introduced correlated compound Poisson frailty model offers a very elegant approach to integrate the concept of cure models into frailty modelling.The survival function is explicitly available and of easy form which allows traditional maximum likelihood parameter estimation.This is the most important advantage of the suggested model compared to the recently introduced model by Moger and Aalen (2005).Important frailty models like the gamma model (γ = 0) and the inverse Gaussian model (γ = 1/2) are included in this model family and provide a great flexibility of the model.Simulation studies are needed to analyze the behavior of the parameter estimates in this model.Different parameter estimation strategies (ML, ML with numerical integration, MCMC) are already analyzed in correlated frailty models without a cure fraction by Wienke et al. (2005).Of great interest would be a non-parametric version of the correlated compound Poisson frailty model, where the baseline hazard functions are not specified.This will be part of future research in this direction.
Cure models suffer from an inherent identifiability problem with right censored observations: The event under study has not occurred either because the person is insusceptible or the person is susceptible, but follow-up was not long enough to observe the event.
The identifiability problem is growing with increasing censoring, but is reduced by the parametric modelling of the baseline hazard.In cure models with fixed censoring times (caused by ending the study) censoring is no longer non-informative even in case where the censoring and the survival times are independent.The proportion of censored observations contains important information about the parameters in the model.For example, in the (usual ideal) case of no censoring, it holds φ = 1.