The Cause Specific Hazard Quantile Function

In this paper, we discuss modeling and analysis of competing risks data using the quantile function. We introduce and study the cause specific hazard quantile function. We present competing risks models using various functional forms for the cause specific hazard quantile functions. A non-parametric estimator of the cause specific hazard quantile function is derived. Asymptotic properties of the estimator are studied. Simulation studies are carried out to assess the performance of the estimator. Finally, we apply the proposed procedure to real life data sets.


Introduction
In survival studies, it is common that the failure of subjects may be attributed to more than one cause.Competing risks models are usually employed to analyze such type of data.In the competing risks set up, for each subject under study we observe a random vector (T, J) where T represents lifetime (possibly censored) and J = {1, 2, . . ., k} is a set of possible causes of failure.Assume that the causes of failure are mutually exclusive.Two frameworks are often employed to deal with standard competing risks data viz.cumulative incidence function formulations and cause specific hazard formulations.
The cumulative incidence function F j (t) is the probability of failure before time t due to cause j given by, F j (t) = P [T ≤ t, J = j], j = 1, 2, . . ., k. (1) Note that F (t) = k j=1 F j (t) is the distribution function of T .The cause specific hazard function h j (t) of T is defined as, h j (t) = lim ∆t→0 P [T < t + ∆t, J = j|T ≥ t] ∆t , j = 1, 2, . . ., k. (2) The h j (t) is the instantaneous rate of failure due to the cause j at time t given the subject has survived up to time t.
Let f j (t) = d dt F j (t) be the cause specific density of T .If the density f j (t) exists, (2) can be written as, where S(t) is the survival function of T .Another important function of interest used for the analysis of competing risks data is the sub-survival function S j (t), defined by, The function (4) does not represent a proper survival function of an observable random variable (Lawless (2003)).Further S j (t) = 1 − F j (t).
When the causes of failure are mutually exclusive and exhaustive, then the hazard rate of T , h(t) can be written as, Thus, S(t) is uniquely determined by the following identity, From (3), we get the cumulative incidence function F j (t) as, For properties and applications of (1), ( 2) and (4), see Carriere and Kochar (2000), Lawless (2003) and Crowder (2012).
An alternative approach for modeling and analysis of statistical data is to use the quantile function.Both the distribution function and the quantile function convey the same information about the random mechanism of the subject with different implications.The concepts and methodologies based on the distribution function are traditionally employed in statistical theory.However, the quantile function has several interesting properties that are not shared by the distribution function.For example, the sum of two quantile functions is again a quantile function.Parzen (1979) discussed non-parametric statistical modeling of data using the quantile function.Recently, Peng and Fine (2007) developed non-parametric inference procedures for competing risks data using the quantile function.Nair and Sankaran (2009) presented basic reliability concepts viz, hazard rate, mean residual life function etc. in terms of the quantile function.Sankaran, Nair, and Sreedevi (2010) derived a test procedure for comparing various risks using sub-quantile functions and Soni, Dewan, and Jain (2015) proposed tests for successive comparison of quantiles using the quantile functions.Soni, Dewan, and Jain (2012) developed a non-parametric estimator of the quantile density function.Various properties and applications of the quantile function are available in Gilchrist (2000) and Nair, Sankaran, and Balakrishnan (2013).
The objective of the present work is to supplement the work of Peng and Fine (2007) by introducing quantile based concepts in the competing risks set up.We define the cause specific hazard quantile function which is the quantile version of (2).The proposed study has several advantages.In many practical situations, the well known parametric models are not appropriate for the analysis of lifetime data.The quantile approach provides new quantile function models, as shown in Section 2, which are useful for the modeling and analysis of lifetime data.In survival studies, censoring is common.In such contexts, quantile based analysis is more appropriate as quantiles are more robust (Nair et al. (2013)).Finally, the quantile approach gives an alternative methodology for the statistical analysis of competing risks data.
The rest of the article is organized as follows.In Section 2, we present definitions of quantile based reliability concepts useful in competing risks theory.Section 3 discusses non-parametric estimation of the cause specific hazard quantile functions and study asymptotic properties of the estimators.A simulation study is carried out in Section 4 to assess finite sample properties of the estimators.The proposed estimation procedure is illustrated on two real data sets in Section 5. Finally, Section 6 provides major conclusions of the study.

Cause specific hazard quantile functions
Let T be a non-negative continuous random variable representing the lifetime of a subject with distribution function F (t) and density function f (t).Assume that F (t) is strictly increasing.Denote Q(u) = inf{t : F (t) ≥ u} as the quantile function of T .Since F (t) is strictly increasing, we have Q(u) = F −1 (u).Let Q j (u) be the sub-quantile function defined by, and q j (u) = d du Q j (u) be the quantile density and the sub-quantile density functions, respectively (see Peng and Fine (2007)).Taking derivative on both sides of the identity F (Q(u)) = u, we get f (Q(u)) = 1 q(u) .Nair and Sankaran (2009) defined the hazard quantile function of T which is the quantile version of h(t) as, It is shown that Λ(u) uniquely determines the quantile function, Q(u), by We now define the cause specific hazard quantile function as, The quantity Λ j (u) is interpreted as the conditional probability of the failure of the subject in the next small interval of time due to cause j given the survival of the subject at 100 It is easy to see from ( 6) and ( 8), that k j=1 Λ j (u) = Λ(u).Thus the hazard quantile function is the sum of the cause specific hazard quantile functions.Further, note that, Therefore, or The identity (10) enables us to determine Q(u) or Q j (u) from Λ j (u).
Example 2.1 (Constant cause specific hazard quantile function).Suppose that the cause specific hazard quantile function corresponding to jth risk is constant.That is, Hence the hazard function Λ(u) = k j=1 a j is a constant.This leads to the fact that the lifetime T has an exponential distribution with Then the quantile function is, and Thus the cause specific hazard functions are proportional.

Non-parametric estimation of the cause specific hazard quantile function
We develop a non-parametric estimator of Λ j (u) under right censoring using the kernel density estimation approach.Suppose that the lifetime T is randomly right censored by a variable Z.
Then we observe a random vector (Y, δ, δJ) where Y = min(T, Z) and δ = I(T ≤ Z).Note that δJ is 0 for a censored observation, otherwise it is the cause of failure.Denote G(t) and H(t) as the distribution functions of Z and Y , respectively.When Z and T are independent, we have, The tuples (Y i , δ i , δ i J i ) are assumed to be realizations of random variables (Y, δ, δJ), for subjects 1, 2, • • • , n.Thus δJ equals zero for a censored observation, otherwise it is the cause of failure.
If censoring is assumed, the Kaplan-Meier estimator of S(t) for the ordered failure times where d k is the number of failures at Y (k) and n k is the number of subjects at risk in Let Y j(1) < Y j(2) < ... < Y j(n j ) be ordered failure times due to risk at j.The Kaplan-Meier estimator of S j (t) is obtained as where d jk is the number of failures at Y j(k) and n jk is the number of subjects at risk in A simple non-parametric estimator of Λ j (u) is given by, where Q(u) = inf{x : F (x) > u} is the non-parametric estimator of Q(u) and Function K(x) is a kernel function satisfying following conditions: Substituting ( 15) in ( 13), we get an estimator of Λ j (u).
Theorem 1. Suppose that K(x) satisfies conditions (a) to (d).Assume that both F (x) and The proof is given in Appendix A.
In the following theorem, we prove the limiting distribution of follows a normal distribution with mean 0 and variance σ 2 j (u), where, The proof is given in Appendix B.
Remark 1.Since the analytical expressions of σ 2 j (u) is complex, we have to use the bootstrap procedure for estimating the variance of Λj (u), j = 1, 2, ..., k.The bootstrap method is based on the resampling method from the original data.We take B samples of size n from the original data using random sampling with replacement.The bootstrap samples are (Y We then compute Λj (u), using original data set and the estimate of Λ j (u) using the bootstrap sample k is Λ(k) j (u), k = 1, 2, ..., B. We then compute the bias by taking differences Λ(k) j (u) − Λj (u), j = 1, 2; k = 1, 2, ..., B. Then using these differences, the average bias and MSE are calculated.

Simulations
We now carry out extensive simulation studies to find out mean square error (MSE) and bias of the estimator Λj (u) for the uncensored as well as the censored case.We consider two causes of failure.We take different samples of size 50, 100 and 200.We generated 5000 data sets for each scenario.The order of sub-quantiles considered are u = 0.2 (0.2) 0.8.Simulations are carried out for uncensored and censored cases to find the average bias and MSE of the estimators.We have employed the triangular, uniform and Epanechnikov kernel functions in simulation studies.However, results are being reported for the Epanechnikov kernel as this provides the smallest MSE.The Epanechnikov kernel is defined by, To generate random numbers, we consider the following two quantile function models.
(1) Linear cause specific hazard quantile function (Midhu, Sankaran, and Nair ( 2014)) Suppose that the cause specific hazard quantile function for the cause j is given by the function, Λ j (u) = a j + b j u, a j > 0, a j + b j > 0, 0 < u < 1.
The cause specific hazard quantile function for the risk j is, The relation Λ(u) = k j=1 Λ j (u) and identity (7) provide the hazard quantile function and quantile function as, where β = k j=1 ξ −φ j .
Since the proposed estimator of the cause specific hazard quantile function is based on the kernel function, the choice of bandwidth is an important issue.For the construction of kernel type estimator of a quantile function, Padgett (1986) has considered separate bandwidths for different regions of u ∈ (0, 1) in such a way that the mean squared error (MSE) is minimum.In our study, we calculate the optimum bandwidths corresponding to different values of u such as 0.2, 0.4, 0.6, and 0.8.The average of the optimal bandwidths obtained for different values of u is employed for the construction of the proposed estimators.
To perform the simulation study, we use the same parameter combinations for the linear cause specific hazard quantile function model in both censored as well as uncensored cases.The same procedure is adopted for the Weibull cause specific hazard model.The parameter values chosen for the linear cause specific hazard quantile function model are a 1 = 1 2 , b 1 = 3, a 2 = 1 3 , and b 2 = 2.For the Weibull model, we take φ = 3, ξ 1 = 1 and ξ 2 = 2.

Results for the uncensored case
We first consider the linear cause specific hazard quantile function for different sample sizes n = 50, 100, and 200.The estimators Λj (u), j = 1, 2 are calculated for all values of u (0 < u < 1), which provides the smooth curves.Then the average bias and MSE of the estimators are computed.The bandwidths for Λ1 (u) and Λ2 (u) are obtained as 0.52 and 0.64 respectively.Figures 1(a) and 1(b) show mean of the estimators and true values of Λ j (u), j = 1, 2 for n = 200.The results for n = 50 and 100 are similar.Table 1 presents   We then consider the Weibull cause specific hazard model (15).The estimators Λj (u), j = 1, 2 are calculated.The bandwidths which give minimum MSE for Λ1 (u) and Λ2 (u) are 0.72 and 0.44 respectively.Figures 2(a) and 2(b) show mean of the estimators of Λ j (u), j = 1, 2 for n = 200.Table 2 gives average bias and MSE of the estimators of the cause specific hazard quantile functions.Note that both average bias and MSE decrease as sample size increases.

Results for the censored case
The censored observations are generated using uniform distribution U (0, C), where C is chosen such that 20% observations are censored.We first consider the linear cause specific hazard quantile function model.We compute the average bias and MSE of the estimators Λj (u), j = 1, 2. The bandwidths which give minimum MSE for Λ1 (u) and Λ2 (u) are 0.67 and 0.31 respectively.
Figure 3 shows the mean of the estimators and the original values of Λ j (u), j = 1, 2 for n = 200.Table 3 presents the average bias and MSE under censoring.Both average bias and MSE decrease as sample size increases.
We generate observations from the Weibull cause specific hazard model with the censoring scheme given above.The Λj (u), j = 1, 2 are calculated and the average bias and MSE of the estimators are computed.The bandwidths which give minimum MSE for Λ1 (u) and Λ2 (u) are 0.59 and 0.38 respectively.Figure 4 shows the mean of the estimators and true values of Λ j (u), j = 1, 2 for n = 200.Table 4 presents average bias and MSE of the estimators of Λ j (u), j = 1, 2. It follows that the average bias and MSE of Λj (u), j = 1, 2 are small and both decrease as sample size increases.

Real data illustration
In this section, we apply the proposed procedure to two real life data sets.The first one is uncensored data and the second one is censored data.Hoel Data (Hoel ( 1972)) The data were obtained from a laboratory experiment on two groups of RFM strain male mice which had received a radiation dose of 300r at an age of 5-6 weeks.The first group of mice lived in a conventional laboratory environment while the second group was in a germ-free environment.There are three major causes for death viz.thymic lymphoma, reticulum cell sarcoma and other cause.All mice died at the end of the study so that there is no censoring.We considered data from first group of 99 mice for analysis.We combine the last two causes since the number of deaths due to reticulum cell sarcoma is small.Thus two causes for the analysis are thymic lymphoma (J 1 ) and other causes (J 2 ) which includes reticulum cell sarcoma.The interest is to compare the mortality from these two modes of death.The estimators of Λ j (u), j = 1, 2 are computed as described in Section 3. The bandwidth which minimizes the bootstrap MSE has been chosen.Bandwidths thus obtained for Λj (u), j = 1, 2, are 0.71 and 0.29 respectively.
Figure 5 shows the cause specific hazard quantile functions.From Figure 5, it is clear that the cause specific hazard quantile function due to thymic lymphoma is uniformly smaller than that due to other causes.We also observe that the two cause specific hazard functions are closer to new lifetime models useful for the analysis of competing risks data.The smooth kernel type estimator of cause specific hazard quantile function has been developed for uncensored as well as censored data.Asymptotic properties of the proposed estimator were studied.The estimator performs well in terms of average bias and MSE for linear cause specific hazard model as well as for Weibull cause specific hazard model.The procedure has been applied to two real lifetime data sets.
The proposed work based on the cause specific hazard quantile functions is an alternative method of modeling and analysis of competing risks data.This technique has the ability to pick up differences at extreme values of the data.The quantile models presented here will enable the practitioner to differentiate between effects of various risks.In survival studies, it is often interesting to compare various risks.The comparison of various risks can be done by developing non-parametric tests using Λj (u).The work in this direction will be reported elsewhere.

Figure 1 :
Figure 1: Mean of the estimators and true values of Λ j (u), j = 1, 2 for the linear cause specific hazard model with optimal bandwidths for n = 200 (uncensored).

Figure 2 :
Figure 2: Mean of the estimators and true values of Λ j (u), j = 1, 2 for the Weibull cause specific hazard model with optimal bandwidths for n = 200 (uncensored).

Figure 3 :
Figure 3: Mean of the estimators and true values of Λ j (u), j = 1, 2 for the linear cause specific hazard model with optimal bandwidths for n = 200 (censored).

Figure 4 :
Figure 4: Mean of the estimators and true values of Λ j (u), j = 1, 2 for the Weibull cause specific hazard model with optimal bandwidths for n = 200 (censored).

Figure 5 :
Figure 5: Estimates of cause specific hazard quantile functions for Hoel data.

Table 2 :
Average bias and MSE for Λ1 (u) and Λ2 (u) for the Weibull cause specific hazard model (uncensored) for the optimal bandwidths.

Table 3 :
Average bias and MSE for Λ1 (u) and Λ2 (u) for the linear cause specific hazard model (censored) for the optimal bandwidths.

Table 4 :
Average bias and MSE of Λ1 (u) and Λ2 (u) for the Weibull cause specific hazard model (censored) for the optimal bandwidths.