On Boundary Correction in Kernel Estimation of ROC Curves

: The Receiver Operating Characteristic (ROC) curve is a statistical tool for evaluating the accuracy of diagnostics tests. The empirical ROC curve (which is a step function) is the most commonly used non-parametric estimator for the ROC curve. On the other hand, kernel smoothing meth-ods have been used to obtain smooth ROC curves. The preceding process is based on kernel estimates of the distribution functions. It has been observed that kernel distribution estimators are not consistent when estimating a distribution function near the boundary of its support. This problem is due to “boundary effects” that occur in nonparametric functional estimation. To avoid these difﬁculties, we propose a generalized reﬂection method of boundary correction in the estimation problem of ROC curves. The proposed method generates a class of boundary corrected estimators.


Introduction
The Receiver Operating Characteristic (ROC) describes the performance of a diagnostic test which classifies subjects into either group without condition G 0 or group with condition G 1 by means of a continuous discriminant score X, i.e., a subject is classified as G 1 if X ≥ d and G 0 otherwise for a given cutoff point d ∈ R. The ROC is defined as a plot of probability of false classification of subjects from G 1 versus the probability of true classification of subjects from G 0 across all possible cutoff point values of X.Specifically, let Austrian Journal of Statistics, Vol. 38 (2009), No. 1, 17-32 F 0 and F 1 denote the distribution functions of X in the groups G 0 and G 1 , respectively.Then, the ROC curve can be written as where p is the false positive rate in (0, 1) as the corresponding cut-off point ranges from −∞ to +∞ and F −1 0 denotes the inverse function of F 0 .A simple non-parametric estimator for R(p) is to use the empirical distribution functions for F 0 and F 1 .The resulting ROC curve is a step function and it is called the empirical ROC curve.Another type of non-parametric estimator for R(p) is derived from kernel smoothing methods.Kernel smoothing is most widely used mainly because it is easy to derive and has good asymptotic and small sample properties.Kernel smoothing has received a considerable attention in density estimation context; see, for example the monographs of Silverman (1986) and Wand and Jones (1995).However, applications of kernel smoothing in distribution function estimation are relatively few.Some theoretical properties of a kernel distribution function estimator have been investigated by Nadaraya (1964), Reiss (1981), and Azzalini (1981).Lloyd (1998) proposed a nonparametric estimator of ROC by using kernel estimators for the distribution functions F 0 and F 1 .Lloyd and Yong (1999) showed that Lloyd's estimator has better mean squared error properties than the empirical ROC curve estimator.However, his estimator has some drawbacks.For example, Lloyd's estimator is unreliable near the end points of the support of the ROC curve due to so-called "boundary effects" that occur in nonparametric functional estimation.Although there is a vast literature on boundary correction in density estimation context, boundary effects problem in distribution function context has been less studied.
In this paper, we develop a new kernel type estimator of the ROC curve that removes boundary effects near the end points of the support.Our estimator is based on a new boundary corrected kernel estimator of distribution functions and it is based on ideas of Karunamuni and Alberts (2005a, 2005b, 2006), Zhang andKarunamuni (1998, 2000), (Karunamuni andZhang, 2008), andZhang, Karunamuni, andJones (1999) developed for boundary correction in kernel density estimation.The basic technique of construction of the proposed estimator is kind of a generalized reflection method involving reflecting a transformation of the observed data.In fact, the proposed method generates a class of boundary corrected estimators.We derive expressions for the bias and variance of the proposed estimator.Furthermore, the proposed estimator is compared with the "classical estimator" using simulation studies.We observe that the proposed estimator successfully remove boundary effects and performs considerably better than the "classical estimator".
Kernel smoothing in distribution function and ROC curve estimation is discussed in the next section.The proposed estimator is given in Section 3. Simulation results are given in Section 4. A real data example is analyzed in Section 5. Finally, some concluding remarks are given in Section 6.

Kernel ROC Estimator
Suppose that independent samples X 01 , . . ., X 0n 0 and X 11 , . . ., X 1n 1 are available from some two unknown distributions F 0 and F 1 , respectively, where F 0 ∈ G 0 and F 1 ∈ G 1 and G 0 and G 1 denote two groups of continuous distribution functions.Then a simple nonparametric estimator of the ROC curve where F 0 and F 1 denote the empirical distribution functions of F 0 and F 1 based on the data X 01 , . . ., X 0n 0 and X 11 , . . ., X 1n 1 , respectively; that is Note that R is not a continuous function.In fact, it is a step function on the interval [0, 1].This is a notable weakness of the empirical ROC curve R(p).Since the ROC curve is a smooth function of p, we would expect to have an estimator that is smooth as well.Lloyd (1998) proposed a smooth estimator using kernel smoothing techniques.His idea is to replace unknown distribution F 0 and F 1 by two smooth kernel estimators.Specifically, he employed following kernel estimators of F 0 and F 1 : where W (x) = x −1 K(t)dt, h 0 and h 1 denote bandwidths (h 0 → 0 and h 1 → 0 as n 0 → ∞ and n 1 → ∞, respectively), and K is a unimodal symmetric density function with support [−1, 1].The corresponding estimator of the ROC curve R(p) is then given by An example of a smooth estimate of R(p) using R(p) is illustrated in Figure 1.
When G 0 and G 1 contain distributions with finite support then the estimator R exhibits boundary effects near the endpoints of the support due to the same boundary effects that occur in the uncorrected kernel estimators F 0 and F 1 .The main purpose of this article is to improve the kernel distribution estimators and thereby to avoid boundary effects of smooth kernel ROC estimators.Details of the boundary problem with F 0 and F 1 are described in the next section.

Kernel Distribution Estimator and Boundary Effects
Let f denote a continuous density function with support [0, a], 0 < a ≤ ∞, and consider nonparametric estimation of the cumulative distribution function F of f based on a random sample X 1 , . . ., X n from f .Suppose that F (j) , the j-th derivative of F , exists and is 20 Austrian Journal of Statistics, Vol. 38 (2009) continuous on [0, a], j = 0, 1, 2, with F (0) = F and F (1) = f .Then the traditional kernel estimator of F is given by where K is a symmetric density function with support [−1, 1] and h is the bandwidth (h → 0 as n → ∞).The basic properties of F h,K (x) at interior points are well-known (e.g.Lejeune and Sarda, 1992), and under some smoothness assumptions these include, The performance of F h,K (x) at boundary points, i.e., for x ∈ [0, h) ∪ (a − h, a], however, differs from the interior points due to so-called "boundary effects" that occur in nonparametric curve estimation problems.More specifically, the bias of From expression (1) it is now clear that the bias of To remove this boundary effect in kernel distribution estimation we investigate a new class of estimators in the next section.

The Proposed Estimator
In this section we propose a class of estimators of the distribution function F of the form where h is the bandwidth, K is a symmetric density function with support [−1, 1], and g 1 and g 2 are two transformations that need to be determined.The same type of estimator in density estimation case has been discussed in Zhang et al. (1999).As in the preceding paper, we assume that g i , i = 1, 2, are nonnegative, continuous and monotonically increasing functions defined on [0, ∞).Further assume that g −1 i exists, g i (0) = 0, g (1) i (0) = 1, and that g (2) i exists and is continuous on [0, ∞), where g (j) i denotes the j-th derivative of g i , with g (0) i = g i and g −1 i denoting the inverse function of g i , i = 1, 2. We will choose g 1 and g 2 such that F h,K (x) ≥ 0 everywhere.Note that the i-th term of the sum in (3) can be expressed as The preceding integral is non-negative provided the inequality −x + g 1 (X i ) ≤ x + g 2 (X i ) holds.Since x ≥ 0, the preceding inequality will be satisfied if g 1 and g 2 are such that g 1 (X i ) ≤ g 2 (X i ) for i = 1, . . ., n.Thus we will assume that g 1 and g 2 are chosen such that g 1 (x) ≤ g 2 (x) for x ∈ [0, ∞) for our proposed estimator.Now, we can obtain the bias and variance of (3) at x = ch, 0 ≤ c ≤ 1, as The proofs of ( 4) and ( 5) are given in the Appendix.Note that the contribution of g 2 on the bias vanishes as c → 1.By comparing expressions (1), ( 4), (2), and ( 5) at boundary points we can see that the variances are of the same order and the bias of ).So our proposed estimator removes boundary effects in kernel distribution estimation since the bias at boundary points is of the same order as the bias at interior points.It is clear that there are various possible choices available for the pair (g 1 , g 2 ).However, we will choose g 1 and g 2 so that the condition F h,K (0) = 0 will be satisfied because of the fact that F (0) = 0.A sufficient (but not necessary) condition for the preceding condition to be satisfied is that g 1 and g 2 must be equal.Thus we need to construct a single transformation function g such that g = g 1 = g 2 .Other important properties that are desirable in the estimator F h,K are the local adaptivity (i.e., the transformation function g depends on c) and that F h,K (x) being equal to the usual kernel estimator F h,K (x) at interior points.For the latter, g must satisfy that g(y) → y as c → 1.In order to display the dependance of g on c, 0 ≤ c ≤ 1, we shall denote g by g c in what follows.
Summarizing all the assumptions, it is clear now that g c should satisfy the conditions Functions satisfying conditions (i) to (iii) are easy to construct.The trivial choice is g c (y) = y, which represents the "classical" reflection method estimator.Based on extensive simulations, we observed that the following transformation adapts well to various shapes of distributions: for y ≥ 0 and 0 ≤ c ≤ 1, where I c = −c −1 W (t)dt. Remark: Some discussion on the above choice of g c and other various improvements that can be made would be appropriate here.It is possible to construct functions g c that improve the bias further under some additional conditions.For instance, if one examines the right hand side of bias expansion (4) then it is not difficult to see that the terms inside bracket (i.e., the coefficient of h 2 ) can be made equal to zero if g c is appropriately chosen.Indeed, if g c is chosen such that then the bias of F h,K (x) would be theoretically of order O(h 3 ).For such a function g c , the second derivative at zero, g (2) c (0), will depend on the ratio d 1 = f (1) (0)/f (0).In this case, the function g c would probably be some cubic polynomial; see e.g.Karunamuni and Alberts (2005a, 2005b, 2006).Then the problem of estimation of d 1 naturally arises as in the preceding paper.Another problem that one would face is that the second derivative g (2) c (0) may not go to 0 when c → 1 as in the case of density estimation context.Thus one may not be able to find any function g c which satisfies condition (iii) and hence the estimator F h,K loses the property of "natural extension" to the classical estimator outside the boundary points.These are basically the main reasons why we decided to implement a quadratic function defined in (6) as our choice of transformation.

Simulation
To test the effectiveness of our estimator, we simulated its performance against the reflection method.The simulation is based on 1000 replications.In each replication, the random variables X 0 ∼ Exp(2) and X 1 ∼ Gamma(3, 2) were generated and the estimate of the ROC curve was computed.The probability distributions of both groups G 0 and G 1 are illustrated in Figure 2.
In all replications sample sizes of n 0 = n 1 = 50 were used.In this case, the actual global optimal bandwidths (see Azzalini, 1981) for F 0 and F 1 are h F 0 = 2.9149 and h F 1 = 5.8298, respectively.For the kernel estimation of the cumulative distributions we used the quartic kernel K , where I A is the indicator function on the set A. In our experience, the quality of estimated curve by using this kernel is not too sensitive to an optimal bandwidth choice.Hence we used this kernel also in the next section.
For each ROC curve we have calculated the mean integrated squared error (MISE) on the interval [0, 1] over all 1000 replications and have displayed the results in a boxplot in Figure 3.The variance of each estimator can be accurately gauged by the whiskers of the plot.The values of means and standard deviations for MISE of each method are given in Table 1.
We also obtained 10 typical realizations of each estimator and displayed these in  F 1 inside the left boundary region, the quality of the final estimate of the ROC can also be influenced by these effects near the right boundary of the interval [0, 1] as well.As we can see in Figure 4, the biggest difference between the above mentioned methods is in the second half part of the interval [0, 1].Table 1 describes the performance of our proposed method with respect to the MISE.The values of the mean and the standard deviation for the MISE were smallest in case of our proposed estimator.Although the theoretical bias of our estimator is of the same order as in the case of the reflection method, the numerical results of estimators of the ROC curves were better for our estimator in the simulation.In our opinion, this is due to the fact that our estimator is locally adaptive.

Consumer Loans Data
In this example we used some (unspecified) scoring function to predict the solidity of a client.The goal here is to determine which clients are able to pay their loans.We considered a test set of 332 clients; 309 paid their loans (group G 0 ) and 22 had problems with (3) Figure 4: Estimates of the ROC for our proposed method (1), the reflection method (2), and the classical estimator with boundary effects (3).payments or did not pay (group G 1 ).We used the ROC curve to assess the discrimination between clients with and without a good solidity.It is of interest for us to know here if our scoring function is a good predictor of the solidity.
Estimates of ROC are illustrated in Figure 5.The dashed line represents the estimate obtained by our proposed method and the solid line is for the kernel ROC with boundary effects.When choosing the optimal bandwidths for distribution function estimation, we used the method described in Horová, Koláček, Zelinka, and El-Shaarawi (2008).A somewhat similar method for density estimation is given in Sheather and Jones (1991).The optimal bandwidths for distribution functions F 0 and F 1 were estimated as ĥF 0 = Austrian Journal of Statistics, Vol. 38 (2009) From the estimates of the ROC one can see that the scoring function is not a good predictor of the solidity of a client.This fact could be also affected by the different sizes of both groups.When group G 1 is too small it causes larger boundary effects.It is clearly visible that the estimate of the ROC obtained by the classical estimator (solid line) has some values under the diagonal of the unit square.However, this situation does not show up theoretically.Thus there is a larger influence of boundary effects to the quality of final estimates of the ROC.

Conclusion
In this paper we proposed a new kernel-type distribution estimator to avoid the difficulties near the boundary.The technique implemented is a kind of generalized reflection method involving reflecting a transformation of the data.The proposed method generates a class of boundary corrected estimators and it is based on ideas of boundary corrections for kernel density estimators presented in Karunamuni and Alberts (2005a, 2005b, 2006).We showed some good properties of our proposed method (e.g., local adaptivity).Furthermore, it is shown that bias of the proposed estimator is smaller than that of the "classical" case.

Appendix
Proof of (4).For x = ch, 0 ≤ c ≤ 1, using the property Using a Taylor expansion of order 2 on the function F g −1 1 (•) we have By the existence and continuity of F (2) (•) near 0, we obtain for x = ch Therefore, Now, (7) and a Taylor expansion of order 1 of the functions Austrian Journal of Statistics, Vol. 38 (2009), No. 1, 17-32 give From the symmetry of K and the definition W (x), one can write W and therefore the coefficient of h is zero.So after some algebra we obtain the bias expression as Proof of (5).Observe that for x = ch, 0 ≤ c ≤ 1, we have where Using a Taylor expansion as in the last proof, it can be shown that For A 1,2 we use the identity W (t) = 1 − W (−t) and similarly as in the last proof we get Using the continuity of g (2) i , g i (0) = 0, and g (1) i (0) = 1, i = 1, 2, and by a Taylor Austrian Journal of Statistics, Vol. 38 (2009), No. 1, 17-32 expansion of order 2 on g 2 g −1 1 (•) , we have With the preceding expansion we obtain With the expression obtained for the bias we obtain the expression for A 2 as

Now we can express A 1 as
Finally, we obtain the variance of the estimator as
boundary points, while the variance of F h,K (x) is of the same order.This fact can be clearly seen by examining the behavior of F h,K inside the left boundary region [0, h].Let x be a point in the left boundary, i.e., x ∈ [0, h].Then we can write x = ch, 0 ≤ c ≤ 1.The bias and variance of F h,K (x) at x = ch are of the form

Figure 2 :
Figure 2: The probability distribution of groups G 0 and G 1 .

Figure 5 :
Figure 5: The estimate of the ROC for consumer the loans data.

Table 1 :
Means and standard deviations of the MISE.