An Application of the Cox-Aalen Model for Breast Cancer Survival

Semiparametric hazard function regression models are among the well studied risk models in survival analysis. The Cox proportional hazards model has been a popular choice in modelling data from epidemiological settings. The Cox-Aalen model is one of the tools for handling the problem of non-proportional effects in the Cox model. We show an application on Piedmont cancer registry data. We initially fit standard Cox model and with the help of the score process we detect the violation of the proportionality assumption. Covariates and risk factors that, on the basis of clinical reasoning, best model baseline hazard are then moved into the additive part of the Cox-Aalen model. Multiplicative effects results are consistent with those of the Cox model whereas only the Cox-Aalen model fully represents the timevarying effect of tumour size. Zusammenfassung: Semiparametrische Regressionsmodelle gehören zu den gut untersuchten Modellen der Überlebensanalyse. Das Cox Proportional Hazards Modell wird häufig zur Modellierung von epidemiologischen Daten herangezogen. Die Verwendung des Cox-Aalen Modells ist eine Möglichkeit, um das Problem von nicht-proportionalen Effekten im Cox Modell zu behandeln. Wir zeigen eine Anwendung dieser Modelle auf Daten des Krebsregisters Piemont. Anfangs wird ein Cox Modell angepasst und mit Hilfe des Score-Prozesses werden Verletzungen der Proportionalitätsannahme entdeckt. Kovariable und Risikofaktoren, die aufgrund von klinischen Gründen am besten den Baseline Hasard beschreiben, werden dann im additiven Teil des Cox-Aalen Modells modelliert. Die Ergebnisse der multiplikativen Effekte entsprechen denen des Cox Modells, aber nur das Cox-Aalen Modell kann den zeitabhängigen Effekt der Tumorgröße vollständig repräsentieren.


Introduction
In survival analysis, regression models are employed to explain the occurrence of events over a period of time taking various explanatory variables into consideration.The semiparametric Cox (1972) proportional regression model is the cornerstone of modern survival analysis and even if many alternatives exist in statistical literature, like the additive Aalen (1980Aalen ( , 1989) ) model, these are mostly overshadowed by it.In any case, the assumption of constant effect over time, either additive or multiplicative, of categorical covariates is not correct in many medical contexts.Zahl (2003) showed how unlikely it might be to assume that high risk factors are multiplicative, at least in cancer survival analysis.For example, one covariate may initially increase the hazard rate and it may later change, becoming even protective.Aggressive cancer treatments using high-dose chemotherapy or bone marrow transplant operate in this way.Alternatively, some covariates may initially decrease the hazard rate, but the long-term effect might be adverse as that of treatment in malignant lymphoma.
Both Cox and Aalen models were extended in several directions.Many authors considered the Cox model with time-varying effects, whereas others tried to combine Cox's and Aalen's model.
Multiplying the additive and multiplicative hazards models, Scheike and Zhang (2002) studied the model which will be referred to as Cox-Aalen.For this model, some covariate effects are believed to result in multiplicative effects whereas other effects are best described as additive.This approach extends the traditional Cox model by allowing the baseline intensity to depend on covariates through the additive Aalen model.It further extends the additive Aalen model, providing a very flexible and useful class of models.When some covariates are known or expected to have strongly time-varying effects, we can include them in the additive part of this model, thus utilizing the higher flexibility of the Aalen model.Other covariates can potentially be included in the multiplicative part of the model.An alternative way of thinking of this model is to consider it as an approximation to the general stratified hazard model suggested by Dabrowska (1997).
Goodness of fit of multiplicative effects must be investigated by appropriate tests, usually based on asymptotical arguments (Scheike and Zhang, 2003).
In this paper we present an application of the Cox-Aalen model on a breast cancer cohort identified by the population-based Piedmont (north-western Italian region) Cancer Registry.In general, the aim of a population-based cancer registry is to collect information on every case of cancer identified within a specified population over a given period of time.The data are validated by a rigorous scheme of procedures, and a large amount of resources are invested to ensure complete follow-up.Each cancer case is potentially observed from diagnosis (entry time) until death occurs.
When the failure (i.e.death) is observed for each subject, we have complete data on the failure (or survival) times.However, a certain time interval or follow-up period is usually specified when updating survival and some cases may not have experienced the event by that time.Besides, some patients could move out of the registry area and their lifetimes would not be followed entirely.Therefore, data collected by a cancer registry consist of two part: the complete survival times and the right-censored survival times.
Cancer registry data may not be appropriate for studies in cancer epidemiology and health care quality assessment which require detailed information on exposures, comorbidity and health care utilization.In these situations, record-linkage procedures between cancer registry data and other sources such as hospital discharge data and screening programs data have to be applied.
Breast cancer is the most common malignancy in women, accounting for about 25% of all female cancers.Incidence is highest in northern Europe (age-standardised incidence rate 90-100 per 100 000) and declines from the north to the south of the continent.The prognosis for breast cancer is relatively good, with 5-year relative survival exceeding 75% in most countries of Western Europe (80% in Italy) (see Sant et al., 2003).
For this cancer site, covariates measured at diagnosis may eventually change their influence on survival during the follow-up period as a consequence of treatments (such as surgery, chemotherapy and radiation).
Moreover, given that the risk proportionality assumption is known to fail gradually as the follow-up period increases, analyses that take into account time-varying effects are required.We suggest the Cox-Aalen model, with its excess risk interpretation of the additive part and standard relative risk interpretation of the multiplicative part, could accomplish this task in a flexible way.
The outline of the paper is as follows.The second section gives a definition of the Cox-Aalen survival model whereas the third shows its application to analyzing survival of patients with breast cancer.The final section contains concluding remarks.

Cox-Aalen Survival Model
The multiplicative hazards models encompass the well-known proportional hazard model, introduced by Cox in the context of survival data and later extended to the counting process framework (Andersen and Gill, 1982).
The Cox regression model assumes that the intensity process (or hazard function) for the i-th subject is where Y i (t) is an at-risk indicator process, λ 0 (t) is the baseline intensity, X i (t) ∈ R p is a covariate vector, possibly changing over time, and β ∈ R p are the regression coefficients.Both Y i and X i are assumed to be left-continuous processes (predictable).If the covariates are time-independent the model assumes that the hazard rates for different values of the covariates are proportional.If we consider the special case where p = 1 and X 1 is timeinvariant, for example a disease indicator, the hazard rate (or relative risk) is the ratio which is seen not to depend on time because only the baseline intensity reflects this dependency.This proportionality assumption should be scrutinized carefully, since covariate effects often do not lead to constant relative risk (Houwelingen, 2000;Kvaloy and Neef, 2004).
Statistical inference in the Cox model is primarily based on maximum partial likelihood.Suppose that there are no ties between the event times and that censoring is noninformative.Let t (1) < • • • < t (N ) be the ordered event times and let R j the risk set at time t (j) , that is the set of individuals who have not experienced the event (not failed) or been censored by that time.Further let X(t (j) ) denote the covariate vector for the individual who fails at time t (j) .The partial likelihood (Cox, 1972) is thus expressed by . (1) By maximizing (1) and then taking partial derivatives with respect to the β's, we get the efficient score equations.The maximum likelihood estimates are the solutions to the p nonlinear score equations U h (β) = 0, h = 1, . . ., p.This can be done numerically using iterative methods such as Newton-Raphson technique.
With the parameter β known (or replaced by the maximum partial likelihood estimate), the cumulative baseline hazard function Λ(t) = t 0 λ(s)ds can be estimated by the Breslow estimator, which takes the same form as the Nelson-Aalen estimator.
The model may be extended by allowing the regression coefficients to be time-varying but this issue could be quite hard to settle, both theoretically and practically.
For the Cox model many goodness of fit tests were developed.Among others, we mention the tests on parameters or those on the functional form of a covariate (a survey of these tests can be found in Therneau and Grambsch, 2000).Besides these tests focused on specific departures from the Cox model, there are also global tests; some of them are based on the doubly cumulative hazard function (McKeague and Utikal, 1991) and others are based on the martingale residual process (Marzec and Marzec, 1997).
Another approach to modelling survival data is to assume that covariates act additively on the hazard.The covariates are assumed to impact additively upon a baseline hazard but the effects are not forced to be constant.The impact is therefore allowed to vary freely over time according to the underlying intensity where α(t) is a nonparametric p-dimensional regression function that is constrained by λ i (t) ≥ 0. Since α(t) is allowed to change over time, the partial likelihood approach no longer works for finding an estimator.The idea in this case is to find an estimator for the integrated or cumulative regression function by a type of least squares method and smooth the estimator (for example, by kernel smoothing) to obtain an estimator for α(t).Crude estimates of the regression functions can be found by examining the slope of the fitted A(t)'s estimates.Aalen (1980) proved that a reasonable estimator of A(t), usually referred as the ordinary least square estimator (OLSE), is given by where I k is a column vector consisting of zeros except for a one in the place corresponding to the subject who experiences the event at time t k and the matrix X(t) is a generalized inverse of Y (t).Obviously, the estimator is only defined over the time interval where Y (t) has full rank.In terms of asymptotic efficiency, the weighted-least-squares estimator (WLSE) introduced by Huffer and McKeague (1991) is a better choice.
Goodness of fit procedures are mainly based on martingale residuals (Aalen, 1993).These residuals can be defined up to the time t R when the matrix Y (t) looses its full rank.Let N i (t) be a counting process for the i-th individual, indicating whether the event has happened.The process equals 0 up to the event time and 1 afterwards.In the case of censoring, it always remains equal to 0. The definition of the martingale residual process for the i-th individual may be written as: where Λ * i (t) represents an estimator of the cumulative intensity.The value of (3) at the final estimation time t R is denoted as martingale residual.The rationale for the above definition is that one compares the observed counting process with the expected intensity.
Additive and proportional hazard models postulate a different relationship between hazard and covariates and the subject matter rarely indicates which of the models are preferable.Both techniques may often be used to complement each other and to provide different summary measures.One advantage of the additive model is that time-varying effects are easy to estimate and no smoothing parameter need to be chosen.Scheike and Zhang (2002) suggest a model that combines the multiplicative and additive model, the Cox-Aalen model, where the intensity is on the form Some covariates effects thus work additively on the risk and other covariates are allowed to have multiplicative effects.Approximate maximum likelihood estimators of the baseline intensity functions and the relative risk parameters of the Cox model are derived by solving the score equations.For a detailed description of the estimation procedure we refer to Scheike and Zhang (2002).Here we briefly summarize their idea of finding an estimator for β, α(t) and some goodness of fit procedures.
We start by making the observation that for known β, we can use the Huffer and McKeague (1991) estimator for the cumulative intensities defined in (2), which takes the form where N (t) is a multivariate counting process and and Z i (t) ∈ R q is a covariate vector.The weight matrix where h i (t) are known functions not depending on β.
We now solve the joint score equations (derived from the log-likelihood function) for β and α(t).This is done by solving the score equation for α(t) for given β and then substituting this solution into the score equation for β.The score equation for α(t) is given by Solving for a given β and W (t) as above yields the Huffer-McKeague estimator.Inserting this estimator of the cumulative intensity (4) into the score for β, then β is defined to be the solution to the equation: The initial estimates β and d Â(t) may be computed with simple weights that do not require the knowledge of α.A particularly simple choice of the weights to use as primary estimators is to set h i (t) = 1 for all i.The maximum likelihood weights have h i (t) = X T i (t)α(t) and they can be estimated by smoothing d Â(t) to obtain α(t) and setting h i (t) = X T i (t)α(t).This procedure is carried out iteratively to find final estimates of β and α(t).
To evaluate the goodness of fit of the covariates included in the multiplicative part of the Cox-Aalen model, Scheike and Zhang consider cumulative score processes as in Wei (1984) and later in Lin et al. (1993).The authors base their inferential procedures on asymptotic distributions of the observed score process U ( β, t), derived and approximated by simulations.If the model is proportional for the j-th covariate, then the j-th component of the score process should behave as under the null.A test statistics may be constructed by considering: To test if covariates that are included in the additive part of the model are significant, departures of Âj (s) from the null are considered.Similarly to investigate if an additive component has a time-varying effect, departures between Âj (s)/s and one estimate of the constant effect under the null are analyzed.

Survival from Breast Cancer
We identified incident breast cancer cases, aged 40-79 and diagnosed from 1997 through 1999, from Piedmont Cancer Registry data within a high resolution study in co-operation with the Breast Cancer Screening Program.We performed a deterministic record-linkage between these patients and their respective hospital discharge records.Out of this initial cohort, 1704 surgical cases (80%) were linked to the discharge abstract database.The linkage provides a valuable source of information to analyze comorbidity, cancer treatments and hospital characteristics.Among these cases, 285 died, 4 were lost to follow-up and the remaining 1415 were still alive at 31st July 2003.Risk factors for death were age at diagnosis (mean of 62 years), tumour size categorized into six groups (≤ 2cm, > 2 to 5cm, > 5cm, extension to chest wall/skin, in situ, missing), (see International Union Against Cancer (UICC), 1997), regional lymph nodes involvement (negative, positive or not assessed), presence of distant metastases and comorbidity score (Romano et al., 1993) categorized into three groups (0, 1 or > 1).Table 1 describes the patients characteristics at the date of cancer diagnosis (baseline).
We initially fitted a Cox regression model with all the observed risk factors included in the linear predictor.According to the Grambsch and Therneau test (GT), proportionality of covariate effects is not satisfied for tumour size and presence of metastases, resulting  2).The overall p-value is 0.04.To increase model flexibility and possibly avoid to force proportionality assumption for some covariates, a Cox-Aalen model was introduced.
We considered the model where tumour size, presence of metastases and age at diagnosis had an additive effect while nodal involvement and comorbidity score had a multiplicative effect.The choice on which factor to be modelled additively or multiplicatively was based not only on residual analysis and goodness of fit evaluation but also on clinical reasoning.In fact, even though Grambsch and Therneau test did not reject the hypothesis of risk proportionality for age at diagnosis (p-value 0.48), this factor is supposed to strongly influence baseline risk and therefore seemed natural to include it in the additive part of the model.
Constant multiplicative effect estimates under the Cox model were in agreement with those in the multiplicative part of the Cox-Aalen model (see Table 2).The score processes to test risk proportionality for nodal status and comorbidity score are plotted in Figure 1 along with 50 random realizations under the null-hypothesis of constant multiplicative effects.The score processes are clearly inside the expected values as testified by the correspondent score test p-values reported in Table 2.
The results of the test for non-significant effects, appropriate when the effects give monotone cumulative and conducted by testing if the cumulative effect is 0 at the endpoint, are shown in Table 3. Almost all additive terms are significant or borderline significant (as metastases presence).
When plotted against time, the slope of A(t) provides information about the influence of the covariate.In particular, if α(t) is constant (or time-invariant), then the plot should approximate a straight line.Table 3 reports the p-values of the test based on this rationale.
Introducing an additive term via the Aalen model suggests that tumour size has a time-varying effect (see Table 3).During the first 18 months after diagnosis its effect is not significant, later it becomes an important predictor for death.Figure 2 illustrates the cumulative additive effect of tumour size (2-5cm and > 5cm) with 95% pointwise confidence bands.

Discussion
The assumption that the relative risks are constant over time may not hold in practice, at least in cancer survival.There are various way of circumventing this assumption, but most of them are somewhat ad hoc.
If the proportional hazard assumption is critical for a categorical covariate, the most commonly applied approach is to apply the stratified Cox model.The baseline function is thus replaced with as many baseline functions as the number of covariate-strata.In this way, various non proportional developments of the relative risk with time may be reflected.However, when dealing with critical continuous covariates, this method is not really satisfactory since an arbitrary categorization is required.
The phenomenon of time-dependent effect of a continuous covariate can actually, to a certain degree, be modelled by the use of time-dependent covariates.If we suppose that relative risks change at some specific points in time, we can fit a Cox model with piecewise constant time-varying effects, getting different risk estimates for each critical interval.The drawback is that the cut points will not be known a priori but need to be chosen in an ad hoc manner.Our approach is to model in a more flexible way via the Cox-Aalen model, moving all the non proportional effects to the additive part of the model.Instead of having a simple baseline intensity, this extended model uses Aalen's model as its covariate dependent baseline.Another important advantage is that even though covariates are allowed a non-parametric effect, the hassle and difficulty of finding smoothing parameters are not needed.
In the analysis of survival from breast cancer we carried out, not only covariates for which a violation of the proportionality assumption was detected, but also risk factors that most influence a baseline risk were included in Aalen's part of the model.The resulting Cox-Aalen model had an additive part consisting of a baseline, age at diagnosis and tumour size.The proportional part of the model contained regional lymph nodes involvement and comorbidity score.Even though proportionality in age at diagnosis effect was satisfied, this factor is supposed to strongly influence baseline risk and therefore it seemed natural to include it in the additive part of the model.Multiplicative effects were consistent with those of Cox model whereas tumour size showed a time-varying effect.

Figure 1 :
Figure 1: Score test for multiplicative effect for regional lymph nodes involvement and comorbidity score in Cox-Aalen model and 50 randomly chosen processes (grey lines) under the null of a constant multiplicative effect.

Table 2 :
Constant multiplicative effects, standard errors and proportionality test under the standard Cox model and under the Cox-Aalen model(*).

Table 3 :
Additive effects under the standard Cox-Aalen model.Non significant effects evaluated at the endpoint.