Gram-Charlier A Series Based Extended Rule-of-Thumb for Bandwidth Selection in Univariate Kernel Density Estimation

Bandwidth parameter estimation in univariate Kernel Density Estimation has traditionally two approaches. Rule(s)-of-Thumb (ROT) achieve ‘ quick and dirty ’ estimations with some speciﬁc assumption for an unknown density. More accurate solve-the-equation-plug-in (STEPI) rules have almost no direct assumption for the unknown density but demand high computation. This article derives a balancing third approach. Extending an assumption of Gaussianity for the unknown density to be estimated in normal reference ROT (NRROT) to near Gaussianity, and then expressing the density using Gram-Charlier A (GCA) series to minimize the asymptotic mean integrated square error , it derives GCA series based Extended ROT (GCAExROT). The performance analysis using the simulated and the real datasets suggests to replace NRROT by a modiﬁed GCAExROT rule achieving a balancing performance by accuracy nearer to STEPI rules at computation nearer to NRROT, speciﬁcally at small samples. Thus, the article also motivates to derive many such ExROTs based on probability density function approximations through other inﬁnite series expansions.


Introduction
Kernel based nonparametric estimation of continuous probability density function (PDF) is widely used in statistics and data analysis. The estimation depends upon the selected kernel function and its spread decided by the smoothing or bandwidth parameter. The selection of kernel has limited impact on optimal PDF estimation, although Epanechnikov kernel is the most optimal (Epanechnikov 1969) if an optimal bandwidth parameter is assumed to be available. On the contrary, the optimal selection of bandwidth parameter avoids either too noisy ('wiggly') or too smooth an estimation.
There exists various bandwidth selection rules for kernel density estimation (KDE). They differ by the selected and the way estimated criterion to measure accuracy of density esti-mation. Asymptotic mean integrated squared error (AMISE) between the estimated density (f (·)) and the actual density (f (·)) is the most widely used performance criteria, though there are many others (Park and Turlach 1992). Estimation of bandwidth parameter minimizing AMISE requires estimating roughness of the second order derivative of the unknown density to be estimated (R(f (·))), where the roughness of a function is defined as an integration of the squared function. The rule(s)-of-thumb (ROT) bandwidth selectors estimate R(f (·)) assuming a reference distribution for f (·) (Silverman 1986). The simplest and the most popular normal reference ROT (NRROT) assumes f (·) to be a normal density. ROT are not among the most optimal, but are used either as a very fast reasonably good estimator or as a first estimator in multistage bandwidth selectors. As ROT depends upon the scale parameter, either standard deviation or interquartile range, of a distribution, they are called scale measures (Janssen, Marron, Veraverbeke, and Sarle 1995). Then, there are plug-in (PI) methods for bandwidth selection in KDE those plug-in kernel based estimate of R(f (·)). Estimation of any nth order derivative of f (·), denoted as (f (n) ), n integer, in turn requires R(f (n+2) ) estimate and a so called pilot bandwidth to start with. The direct PI methods estimate the pilot bandwidth with an assumption of some parametric family, usually Gaussian, for the unknown density (Woodroofe 1970). More accurate solve-the-equation PI (STEPI) methods estimate the pilot bandwidth as a nonlinear function of the bandwidth of interest (Sheather 1983(Sheather , 1986aPark and Marron 1990;Sheather and Jones 1991). Minimization of such a non-linear equation using iterative methods demand high computation of the order of O(N 2 ) usually, where N denotes number of samples. But Raykar and Duraiswami (Raykar and Duraiswami 2006) derived -exact approximation algorithm based STEPI method, referred in this article as STEPI, requiring O(N + M ) order of computation, where M denotes selected number of evaluation points. There are also available some cross-validation (CV) techniques for bandwidth selection in univariate KDE (Scott and Terrell 1987;Jones, Marron, and Sheather 1996). Usually, they have high estimation variance and demand high computation. Also, very often they find an under-smooth bandwidth parameter (Marron 1996). A brief survey of data driven bandwidth selectors in univariate KDE is provided in (Park and Marron 1990;Jones et al. 1996;Park and Turlach 1992;Sheather 1986b;Sheather et al. 2004;Heidenreich, Schindler, and Sperlich 2013). There has also been some efforts towards exploring Bayesian approach, as well, localized bandwidth estimation for KDE (Kulasekera and Padgett 2006;Cheng, Gao, and Zhang 2019). Interested users may explore many other monographs and articles discussing the topic on univariate and multivariate KDE in a more generalized framework of non-parametric estimations and kernel smoothing (Silverman 1986;Wand and Jones 1994;Simonoff 1996;Scott 2012;Turlach et al. 1993;Izenman 1991;Hwang, Lay, and Lippman 1994;Ćwik and Koronacki 1997). Overall, the ROT or scale measures for bandwidth estimation are simple and fast but lack accuracy. And, the STEPI methods are quite accurate but demand high computation. So, a data dependent bandwidth parameter selection technique for KDE that balances both accuracy and computation is needed, and is the goal of this article.
The article achieves this goal by proposing a novel class of Extended ROT (ExROT) for bandwidth selection in univariate KDE. The assumption of a specific reference density for f (·) to estimate R(f ) in ROT is too restrictive. Intuitively, expressing f (·) in terms of an infinite series of higher order statistics based on a more generalized assumption to estimate R(f ) should result into a better bandwidth selection rule. As a verification and demonstration to this intuition, the article derives a bandwidth selection rule by extending the Gaussian assumption for f (·) in NRROT to near Gaussian assumption. The near Gaussian assumption facilitates use of cumulants based Gram-Charlier A (GCA) series as an approximation to f (·) for the estimation of R(f (·)). The extended assumption justifies the nomenclature ExROT for this class of bandwidth estimators and GCAExROT for this specific GCA based ExROT. Also, as they depend upon higher order cumulants unlike the Scale Measures, those just depend upon the scale parameter, they are named Cumulant Measures for bandwidth selection in KDE.
The empirical analysis considered performance measurement against both the simulated, with varying distributions and varying sample sizes from 50 to 50000, and real data setsusing minimum MISE as the performance measure. It proved GCAExROT being better than NRROT for the KDE of unimodal normal mixture densities suggested as test densities by Marron and Wand (1992). But the same estimator proved to be the worst performer compare to the rest for KDE of multimodal normal mixture densities from the same set. The article analysed that the poor performance was due to the misleading estimations of standard deviation and kurtosis for multimodal densities. To overcome such an overestimation of standard deviation, Silverman (Silverman 1986) suggested a modified NRROT, identified as Silverman's ROT (SROT). Along the same lines, the article derives a modified GCAExROT, identified as GCAk4ExROT, rule. The empirical simulations proved GCAk4ExROT better than both the NRROT and SROT at any number of samples, as well, comparable to STEPI at small samples. Importantly, the GCAExROT rules achieve such an improved performance at the computational cost comparable to that of ROT, which is quite less compare to that of STEPI. The success of GCAk4ExROT over NRROT and SROT opens the scope of finding many such extended rules based on other existing infinite series expansions of a specific reference PDF.
The rest of the article is organized as under. Section 2 discusses necessary background on KDE and AMISE criteria for bandwidth selection. Section 3 derives the novel GCAExROT rule for KDE of one dimension. It is empirically verified for its performance against varying density types at a sample size of 50000 in Section 4. The reasons for poor performance of GCAExROT in multimodal densities are analyzed and a modified GCAExROT rule is suggested in Section 5. The new rule is verified for its performance at steady state in the same section. Section 6 tests all the proposed bandwidth estimators against varying number of samples, varying distributions at small samples and against real datasets. Finally, Section 7 concludes the article.

Bandwidth selection criteria and selectors in univariate KDE
Let there be a random variable X with an unknown PDF f (x), x ∈ R. Given N realizations x 1 , x 2 , . . . , x N of X, the kernel density estimatef (x) is given bŷ where K(x) is a kernel function and h is a bandwidth parameter deciding spread of the kernel. Usually, K(x) is a symmetric PDF, i.e. it satisfies the following properties: The accuracy of a PDF estimation can be quantified by available distance measures between PDFs, such as the mean integrated absolute error, the MISE, the Kullback-Leibler divergence and many others. Minimizing a selected distance measure over varying h obtains an optimal h. The most widely used criteria MISE or IMSE (Integrated Mean Square Error) is given as The assumptions that the underlying unknown density is sufficiently smooth, i.e., all derivatives exist, the kernel has finite fourth order moment, as well, lim N →∞ h = 0 and lim N →∞ N h = ∞, i.e., h reduces to 0 at a rate slower than 1/N , obtains (Silverman 1986) an asymptotic large sample approximation of MISE as where R(f ) = R (f (x)) 2 dx and R(K) = R K 2 (x)dx. In general, R(g) = g 2 (x)dx is identified as the roughness of function g(x). The detailed derivation of AMISE can be found in Appendix 7. Equation (3) proves that the smaller bandwidth values will reduce the estimation bias but increase the estimation variance and vice-versa. An optimal h minimizing AMISE(f (x)) in Equation (3) can be given as: Various rules for bandwidth parameter selection based on minimising AMISE criteria differ in the way R(f ), the only unknown parameter in Equation (4), is estimated. With the assumption of Gaussian PDF for f (x), where σ is the standard deviation of f (x). Also, for a standard Gaussian kernel µ 2 (K) = 1 and R(K) = 1 2 √ π . Using them with Equation (4) obtains the simplest NRROT as For more details on bandwidth selectors, refer (Silverman 1986;Wand and Jones 1994;Simonoff 1996).

The Gram-Charlier A series based extended ROT
Bandwidth parameter selection in a one dimensional KDE using AMISE criteria needs the estimation of roughness of second order derivative of the unknown density to be estimated. Instead of Gaussianity, near Gaussianity assumption for an unknown density will allow to approximate it by GCA series expansion. The GCA series expands an unknown PDF as a linear combination of the increasing order differentiations of Gaussian reference PDF, where the coefficients of expansion involve cumulant differences between those of an unknown PDF and a Gaussian reference PDF.
Let there be a random variable X with PDF f (x), x ∈ R, characteristic function F x and cumulant generating function (CGF) C(λ). By definition, where c k denotes kth order cumulant of f (x). Let the Gaussian reference PDF G(x; µ, σ) = where µ is the mean parameter and σ is the standard deviation, be with γ k as the kth order cumulant. Let c k − γ k = δ k or c k = γ k + δ k . Also, it is known that the characteristic function and the PDF are the Fourier transform (F) of each other, in the sense they are dual. Then, re-writing Equation (6) obtains It is evident from Equation (8) that the α k are related to the cumulant differences δ k in the same manner as the moments are related to the cumulants. Taking inverse Fourier transform (F −1 ) of the above Equation (7) and using the property where D denotes derivative operator with respect to x and D (k) denotes kth order derivative operator, obtains GCA series expansion of f (x) as where G (k) (·) denotes kth order derivative of Gaussian. The derivation is a revised version, using a Gaussian density as a reference density, of the Generalized Gram-Charlier (GGC) series expansion derived in a compact way by Berberan-Santos (Berberan-Santos 2007). There are many other ways to derive univariate GCA series (Hald 2000) or univariate Generalized Gram-Charlier (GGC) series (Schleher 1977;Cohen 1998Cohen , 2011.
The kth order derivative of Gaussian is given by where H k is the kth order Hermite polynomial, expressed using recursion as Choosing the first and second order cumulants of the unknown PDF to be the same as those of the reference Gaussian PDF, i.e. c 1 = γ 1 = µ and c 2 = γ 2 = σ 2 , and taking approximation upto fourth order cumulants obtain Taking derivative twice with respect to x of the Equation (9) for GCA series yields: Choosing c 1 = γ 1 = µ, c 2 = γ 2 = σ 2 , approximation up to fourth order cumulants and using By definition, The simplification is obtained using the popularly known Gaussian Integral and its variants, mention as under: where (k)!! = (k − 1)(k − 3)(k − 5) · · · 1 is called the odd factorial of k. Combining these integrals with Equation (11) obtains Simplifying Equation (15) using Equation (18) brings where k 3 = c 3 /σ 3 denotes skewness and k 4 = c 4 /σ 4 denotes kurtosis (truly speaking, excess kurtosis) of a random variable. Using R(f ) in Equation (19) into Equation (4) obtains the Gram-Charlier A (GCA) series based Extended Rule-of-Thumb (GCAExROT) for bandwidth selection in one dimensional KDE using Gaussian kernel as under: The constant C obtained for some of the special cases, and described as under, better explains GCAExROT: 1, if both k 3 = k 4 = 0, i.e., Gaussian PDF 1 + 0.3760k 2 4 + 0.7292k 4 , if k 3 = 0, i.e., symmetric PDF 1 + 1.0938c 2 3 + 0.3760c 2 4 + 0.7292c 4 , if σ = 1.
As evident, NRROT is one special case of GCAExROT when k 3 = k 4 = 0.

Verification of the GCAExROT bandwidth selector
To verify empirical validity of the derived GCAExROT bandwidth selector, a simple experiment was done. The experiment measures the bandwidth estimator's performance on KDE of simulated sources with varying distributions. There was selected a set of 15 normal mixture densities, used as a test-bed for bandwidth estimators in one dimensional KDE by Marron and Wand (Marron and Wand 1992) and shown in Figure 1. To understand the steady state performance, the number of samples were kept 50000. There were considered 100 simulation trails. GCAExROT performance was compared with those of NRROT, as the representative of ROT methods, and STEPI, as the representative of STEPI methods. The performance comparison was against three parameters -the value of estimated bandwidth, the MISE between the estimated and the actual PDFs and the actual time taken (in Seconds) to estimate the bandwidth. The Speed-up achieved by GCAExROT over STEPI is also used for comparison, where the Speed-up is defined as the ratio of estimation time by STEPI to estimation time by GCAExROT.          Table 1 shows the results as mean of 100 trials. The boldface values denote the best per row, i.e., the best bandwidth estimator giving the least MISE for a density in the first column. The boldface italic values in the table denote a better between NRROT and GCAExROT with the same criteria. The results exhibit STEPI as the best bandwidth estimator at 50000 samples giving the least MISE, except for Gaussian and slightly skewed unimodal distributions. For all unimodal distributions other then the Gaussian, i.e., for skewed, strongly skewed, kurtotic and outlier distributions, GCAExROT has been better than NRROT. Among them, for skewed unimodal distribution it is even better than STEPI. For all multimodal distributions, except the claw distribution, NRROT performance is better than that of GCAExROT. The bandwidth comparison in the table shows that for all distributions, except the Gaussian, smaller the bandwidth better the performance in terms of smaller the MISE. The time comparison in the same table shows that the mean time to estimate the bandwidth parameter using NRROT is of the order of a few tenths of millisecond. This includes the time to estimate standard deviation for the rule. The estimation time for GCAExROT is slightly more than that for NRROT and in the range of a few milliseconds at the given 50000 samples. The added extra time includes the time to estimate the third and fourth ordered cumulants. The bandwidth parameter estimation time for STEPI is of the order of a few tens of second. The mean Table 1: Performance comparison of normal reference Rule-of-Thumb (NRROT), Gram-Charlier A series based extended ROT (GCAExROT) and -exact approximation algorithm based solve-the-equation plug-in ( STEPI) bandwidth selectors for one dimensional Kernel Density Estimation (KDE) against the normal mixture densities by Marron and Wand (Marron and Wand 1992) based on the estimated bandwidths, the mean of the integrated square error (MISE) between the estimated and the actual densities and the time to estimate the bandwidth were used as the performance measures. There were used 50000 samples. The Speed-up by GCAExROT over the STEPI is of the order of a few thousands. Throughout the article, the timings are just to provide relative references and are on a machine Intel 2.5 GHz core i5 with 4 GB 1.6 GHz DDR3 RAM and MATLAB R2014b. The operating system for the experiments on a set of normal mixture densities by (Marron and Wand 1992) was MacOS X EI capitan (version 10.11.6) and for the experiments on a set of Generalized Gaussian densities (GGD) was MacOS Mojave (version 10.14.4).
Overall, GCAEXROT has achieved intermediate level accuracy in bandwidth estimations for KDE of non-Gaussian univariate densities but the same proposed bandwidth estimator has performed the worst, even poorer than NRROT, for the multimodal densities in the test. The reasons must be analysed and if possible the bandwidth estimator should be improved. The next session tries to understand how the higher order cumulants affect the bandwidth estimated by GCAExROT and how it can be modified to get the improved performance.

GCAExROT: understanding and modification
Equation (20) interprets that h GCA adjusts h NR in Equation (5) for kurtosis and skewness through the constant C −1/5 , which works as a scaling coefficient.
As in Equation (21), given k 3 = k 4 = 0 causes the scaling coefficient C = 1 and GCAExROT is the same as NRROT. The left panel of the Figure 2 plots the skewness versus the scaling coefficient C −1/5 keeping k 4 = 0 in Equation (20), i. e., considering the effect of skewness parameter independent of the kurtosis parameter. The plot shows that the skewness, whether positive or negative, scales down h NR to give effectively h GCA < h NR .
Similarly, the right panel in Figure 2 shows the plot of kurtosis versus scaling coefficient C −1/5 keeping k 3 = 0, i. e., the effect of kurtosis independent of the skewness. The solid line plot corresponds to GCAExROT and can be understood in four parts. Given the unknown density has k 4 > 0, i.e., it is superGaussian, implies C −1/5 < 1 and h GCA < h NR . Given the unknown density is subGaussian with 0 > k 4 > −1.9394, implies C < 1, C −1/5 > 1 and h GCA > h NR . Also within this range, for 0 > k 4 > −1, h GCA is increasing but thereafter it is decreasing. Further, h GCA < h NR for densities with k 4 < −1.9394, those giving C −1/5 < 1.
GCAExROT was expected to be better than NRROT due to its widen assumption of near-Gaussianity on unknown density. The assumption fits well the unimodal normal mixture densities among the test-bed densities and that's why empirically for these densities GCAExROT gave better accuracy than NRROT. This justifies the bandwidth parameter estimated and in turn the scaling provided by GCAExROT over NRROT based on the skewness and the kurtosis parameters. But then why not the similar performance repeated for multimodal distributions? The comparison of estimated bandwidths in Table 1 reveals that the bandwidth parameters estimated by GCAExROT for multimodal densities were larger than corresponding those by NRROT. This confirms that the kurtosis estimated was in the range 0 > k 4 > −1.9394 as in Figure 2 (b). Examination of the kurtosis value of the generated multimodal test densities reconfirmed this, empirically. This implies that the combination of multiple modes in these densities resulted into underestimated kurtosis and sub-Gaussianity. Thus, the estimated kurtosis failed to represent the local nature of the densities correctly, which consequently, resulted into wrong bandwidth and poor density estimates. Similarly, the standard deviation also gets overestimated locally in case of multimodal distributions. Summarizing, the globally estimated cumulants as per their definitions, like standard deviation and kurtosis, cause the estimated bandwidth to be wrong locally.

Modified GCAExROT proposal
Ideal remedy to get the correct bandwidth through GCAExROT would be to estimate the kurtosis and standard deviation parameters locally for all the modes in case of KDE of multimodal density. But this will add more complexities and not the goal of a simple ROT and ExROT class of bandwidth estimators. Instead, a correction mechanism to avoid such overestimations or underestimations, validated on empirical and real data, would be a more suitable solution. Historically, to overcome oversmoothing of NRROT due to the overestimation of standard deviation, Silverman suggested a correction mechanism. The modified rule, identified as the Silverman's ROT (SROT), simply scales down ROT and given as : Along the same line, this section derives a modified GCAExROT rule by providing corrections to Equation (20). The overestimation of standard deviation can be taken care in the same way as done for SROT, i.e., the coefficient 1.0592 should be replaced by 0.9. The correction for underestimated kurtosis should take care of two aspects: for a given density having local modes with positive kurtosis either the estimated global kurtosis is reduced but still positive or the estimated value is reduced to such an extent that it becomes negative. For example, if the absolute value of estimated kurtosis is considered as a correction mechanism, then it will take care of the latter condition but not the former. So, as a remedy and correction to the underestimated kurtosis, the article suggests to use the actual kurtosis value instead of the excess kurtosis value, i.e., k 4 + 3. Accordingly, the negatively estimated kurtosis will be turned positive and the positively estimated kurtosis will be increased. Thus, the remedy takes care of both the aspects of underestimated kurtosis. Also, instead of verifying whether the distribution is multimodal or unimodal before the actual application of bandwidth selection rule, the correction is always applied to match the goal of simplification. This defines the modified rule GCAk4ExROT as The dashed line plot in the right panel of Figure 2 shows kurtosis versus scaling coefficient C −1/5 in GCAk4ExROT. As can be seen, for Gaussian density, i.e. k 4 = 0, GCAk4ExROT gives C = 6.5716 and thus scales down h SROT by a factor of C −1/5 = 0.6862. This may degrade the performance for Gaussian and slightly near-Gaussian densities. Thus, the new rule should be verified for its performance on the standard test-bed, as well, specifically for densities with kurtosis range near that of Gaussian.

Performance of GCAk4ExROT on the standard test-bed
This section verifies the modified GCAExROT bandwidth selection rule, GCAk4ExROT, for its steady state performance at sample size of 50000. The experiment was repeated for the new rules on the set of normal mixture densities in Figure 1. Comparison of MISE between the actual and the estimated densities due to the estimated bandwidths by NRROT, GCAExROT, SROT, GCAk4ExROT and STEPI bandwidth selection rules are shown in Table 2. The estimated bandwidths by SROT and GCAk4ExROT are also reported in the same table. As the bandwidths by rest of the bandwidth selectors have already been reported, they are not repeated again. The table entries denote the mean of 100 trials. The boldface values show the least MISE per row, i.e., the best bandwidth estimator for a given density type. The boldface italic values denote the best bandwidth estimator among all except STEPI to provide comparison among the ROT and the ExROT class bandwidth estimators.
It can be observed that GCAk4ExROT has been the second best among all the bandwidth estimators and the best among ROT and ExROT class of bandwidth estimators for the KDE of all the multimodal densities, except the Bimodal density for which it has been the best among all. But this performance of GCAk4ExROT is at the cost of slight loss of performance for Gaussian and slightly skewed unimodal densities compare to GCAExROT, as expected. The results show GCAk4ExROT performance slightly degraded even for the outlier density, but it should be noted that the table entries for MISE are of the order of 10 −6 and both GCAExROT and GCAk4ExROT have their performances better than both NRROT and SROT. Overall, the experiment here validates that the modified GCAExROT rule has improved performance over GCAExROT for its steady state behaviour at 50000 samples. This motivates to test it further for its performance.

Performance of GCAk4ExROT on densities with varying Kurtosis
This section verifies the GCAk4ExROT for KDE of densities with varying kurtosis, as they represent more ideal set of densities and explain the bandwidth corrections over GCAExROT. The densities were generated using symmetrical and unimodal Generalized Gaussian Density (GGD) class of densities with varying the shape parameter β. A density with β value of 2 where estimated densities were using bandwidth selection rules from normal reference Rule-of-Thumb (NR-ROT), Gram-Charlier A series based extended ROT (GCAExROT), Silverman's ROT (SROT), GCAk4ExROT and -exact approximation algorithm based solve-the-equation plug-in ( STEPI) bandwidth selectors. The unknown test densities were from the set of normal mixture densities by Marron and Wand (Marron and Wand 1992). There were used 50000 samples. The denote Gaussian distribution, β < 2 denote super-Gaussian distribution and β > 2 denote sub-Gaussian distribution. Figure 3 (a) shows the super-Gaussian densities, i.e., GGD(β) with β ≤ 2 and Figure 3  in in Table 3. It shows the kurtosis calculated as a mean of all 100 simulation trails for the given density, the mean of the estimated bandwidth parameter (h) and the MISE by NRROT, GCAExROT, SROT, GCAk4ExROT and STEPI bandwidth selection algorithms.
The boldface values denote the least MISE and in turn the best bandwidth estimator for that particular type of density. The boldface italic values denote a better between NRROT and GCAExROT in their mutual comparison. The table shows that compare to NRROT, GCAExROT gave less MISE for all GGD(β) set of densities, except the Gaussian, at large sample size of 50000. This reconfirms empirically that the near-Gaussian assumption worked better than the Gaussian assumption for an unknown unimodal density. In turn, this reconfirms the scaling provided by GCAExROT over NRROT, based on kurtosis. Comparing the results from all the bandwidth estimators, for Gaussian density NRROT, for super-Gaussian unimodal densities GCAk4ExROT and for sub-Gaussian unimodal densities GCAExROT were the best. This implies that GCAk4ExROT improved over GCAExROT for super-Gaussian but degraded for Gaussian and sub-Gaussian. Though GCAExROT was the best, NRROT was better than all remaining -SROT, GCAk4ExROT and STEPI -bandwidth estimators for sub-Gaussian densities.
Overall, GCAk4ExROT improved over GCAExROT for super-Gaussian and multimodal densities but at the cost of some loss of performance for Gaussian and sub-Gaussian densities. Though the latter part is theoretically important to understand the new rule in its entirety, empirically it may not be of much significance as its performance on the set of normal mixture densities developed as a test-bed by Marron and Wand (Marron and Wand 1992) and accepted by the researchers in the field has been satisfactorily improving over GCAExROT. Also, it is a fact that both STEPI and SROT also gave poor performances than NRROT for these Gaussian and sub-Gaussian densities. Accordingly, the next section fully explores the potential of GCAExROT and GCAk4ExROT both for their performances.

Performance analysis of ExROT bandwidth selectors
Once verified for their performances at steady state with 50000 samples, the detailed performance analysis and empirical usefulness of the derived GCAExROT and GCAk4ExROT have been targeted in this section. They are compared for their performances with NRROT, SROT and STEPI against varying number of samples, against varying distributions at small sample sets and against real data sets.

Performance against varying number of samples
The first experiment was done to compare performances of the same bandwidth estimators against KDE of varying number of samples. The number of samples were varied from 50 to 50000. Figure 4 shows the MISE plotted against the log of number of samples for all 15 normal mixture densities using the bandwidth estimators NRROT (Dotted Lines), GCAExROT (Dashed Lines), GCAk4ExROT (Thick Dashed Lines) and STEPI (Solid Lines). To achieve visibility, the plots are shown till 2000 samples only. This is also justified as there is almost same linear rate of reduction of MISE with respect to increasing number of samples and there is no much more difference in MISE at large samples for KDE using the given bandwidth estimators. Similar to the results reported at 50000 samples, Figure 4 shows the superiority of NRROT for Gaussian, SROT for skewed Gaussian and STEPI for strongly skewed and kurtotic unimodal densities for any number of samples in terms of achieving minimum mean of the MISE. But un-compare to the results reported at 50000 samples, it shows that at small samples GCAk4ExROT has been the best giving minimum mean of the MISE for outlier density among the unimodal densities and bimodal, skewed bimodal, trimodal, claw, double claw, asymmetrical claw and asymmetrical double claw density types among multimodal densities.
For these densities, at small samples GCAk4ExROT and at large samples STEPI gave the minimum MISE with the transition for the least MISE happening at a few hundreds to a few thousands of samples depending upon the density type. Overall, GCAk4ExROT has been better than both NRROT and SROT for KDE of all except, Gaussian and slightly skewed unimodal (almost Gaussian) density types at any number of samples. It has also been better than STEPI, for KDE of some of the multimodal density types at reduced sample sizes.  (Marron and Wand 1992). There were used Gaussian kernel, Kurtotic unimodal density and 100 simulation trial for each sample set.
Such a performance of GCAk4ExROT is with reduced time complexity compare to that of STEPI. To understand the gain in computational time, the log of the mean Speed-ups with respect to the log of number of samples have been plotted in Figure 5. The plot has a slope slightly less than one. This implies that with increasing number of samples the Speed-up also increases almost linearly. Overall, the results of this experiment with varying number of samples justify the claim that GCAExROT and GCAk4ExROT provide performance balancing both accuracy and computation, i.e., accuracy better than the ROT and computational cost better than the STEPI bandwidth selectors.

Performance against varying distributions at small sample set
As reported before, the performance of various bandwidth estimators do not vary much at steady state, i.e., large number of samples. Also, nowadays small sample performances of the bandwidth estimators are of much more interest. So, here the bandwidth estimators performances over 100 samples and 50 samples have been reported. There were considered 100 simulation trials for each density estimation from the set of normal mixture densities, those shown in Figure 1. Table 4 reports the results for the bandwidth estimated, mean of the MISE and mean speedup achieved by GCAExROT over STEPI -all against varying distributions at 100 samples. Results at 50 samples have been reported using box-plots of estimated MISE due to the bandwidth estimators in the test and are shown in Figure 6. The labels on the x-axis are sequentially NRROT, GCAExROT, SROT, GCAk4ExROT and STEPI. The mean of MISE reported in Table 4 and the median of MISE in Figure 6, confirm the previous observations of GCAk4ExROT being better compare to both NRROT and SROT for KDE of distributions other than Gaussian and (slightly) skewed Gaussian, as well, the same algorithm being better than STEPI for KDE of outlier unimodal density and many other multimodal densities at small samples. It should also be noted that overall STEPI is the best giving minimum mean and median of MISE for many of the densities but it also has large estimation variances compare to all other bandwidth estimator as can be visualized in the box-plots in Figure 6. The MISE was in the range of 10 −2 for 50 samples and varying from 10 −2 to 10 −3 for 100 samples. The mean speed up achieved by GCAExROT over STEPI was also increased from few tens at 50 samples to almost hundred at 100 samples. where estimated densities were using bandwidth selection rules from normal reference Rule-of-Thumb (NR-ROT), Gram-Charlier A series based extended ROT (GCAExROT), Silverman's ROT (SROT), GCAk4ExROT and -exact approximation algorithm based solve-the-equation plug-in ( STEPI) bandwidth selectors. The unknown test densities were from the set of normal mixture densities by Marron and Wand (Marron and Wand 1992). There were used 100 samples. The

Performance against real datasets
The bandwidth estimators were also tested on three different real datasets. The Adult database from the UCI machine learning repository (Lichman 2013) contains 32561 instances with 14 attributes per instance. Among them, five continuous attributes -age, final weight (fnlwgt), capital gain (cap-gain), capital loss (cap-loss) and hours per week (hrs/wk) -were used for bandwidth and the resultant density estimations. The second dataset used was an 'Old faithful' geyser database from R manual, originally from (Azzalini and Bowman 1990). The database contains 272 instances of durations of eruptions and waiting between two consecutive eruptions for the 'Old faithful' geyser in Yellowstone National Park, Wyoming, USA. There are many other versions of the same 'Old faithful' geyser data and that used by Silverman (Silverman 1986) with 107 observations has also been used here for testing.
As the true densities are not known, the estimators are compared based on the estimated bandwidth values and the visual observation of the estimated densities. Table 5 shows mutual comparision of the bandwidths estimated by the bandwidth estimators in test, as well, the Speed-up achieved by GCAk4ExROT over STEPI. The table entries exhibit that the estimated bandwidths by GCAExROT are smaller than those by NRROT but larger than those by STEPI. Similarly, the bandwidths by GCAk4ExROT are larger than those by STEPI and smaller than those by rest all, including SROT. Among all the estimated bandwidth values, the bandwidth selected by GCAk4ExROT is the nearest to the visually selected best     (Azzalini and Bowman 1990). Similarly, plot (c) is for eruptions in 'Old Faithful' geyser database with 107 observations from (Silverman 1986).
value of h = 0.25 by Silverman (Silverman 1986) for the 'Old faithful' geyser dataset with 107 observations. Interested users can verify estimated bandwidth values from many other bandwidth estimators for the same database in (Loader 1999, Section 4). The estimated densities through bandwidths from SROT, GCAk4ExROT and STEPI for some of the variables from Adult database are plotted in Figure 7 and those from 'Old faithful' geyser database are plotted in Figure 8. It can be observed that the densities estimated using STEPI as the bandwidth selection rule are quite noisy ('wiggly'), those estimated using NRROT as the bandwidth selection rule are too smooth and compare to them, the densities estimated through GCAk4ExROT are intermediate, i.e., quite smooth and at the same time without losing any major bumps.

Conclusion and discussion
Towards bandwidth selection rules for one dimensional KDE, NRROT is the most popular and widely used. It is simple being 'quick and dirty' and assumes the unknown density to be estimated as Gaussian. The current article extends this assumption to near-Gaussianity and derives GCA series based ExROT, identified as GCAExROT, in Equation (20). The exper-imental verification of the derived GCAExROT exhibited better performance over NRROT for unimodal densities but poor performance at multimodal distributions. The analysis found that the estimation of the cumulants over multiple modes causes overestimated variance and underestimated kurtosis locally for a given mode. Based on the historical evidence of NR-ROT performance improvement by Silverman deriving SROT, GCAExROT performance has also been improved deriving GCAk4ExROT in (23) by adapting suitable correction mechanisms. Theoretically the extended assumption and empirically the testing on simulated and real datasets proved GCAExROT and the modified rule GCAk4ExROT as 'quick and not so dirty' bandwidth estimators.
Both the bandwidth selectors derived in this article are simple, quick and with rate of convergence of estimation error against number of samples similar to those of NRROT and SROT. GCAExROT gave better accuracy than both NRROT and SROT for KDE of unimodal densities, accept Gaussian, but gave poor accuracy for KDE of multimodal densities. GCAk4ExROT gave better accuracy than both NRROT and SROT for KDE of all, except Gaussian and slightly skewed unimodal, at any given number of samples. At large samples, GCAk4ExROT achieves substantial speed-up compare to the STEPI, and at small samples it achieves accuracy comparable to that of STEPI for some of the multimodal distribution. As such, STEPI gave the smallest bandwidth parameters, resulting into the smallest estimation bias but gave relatively large estimation variance and noisy ('wiggly') density estimates. Compare to it, GCAk4ExROT gave small estimation variance and less noisy ('wiggly') or more smoother density estimates, but more estimation bias compare to STEPI. Thus, GCAk4ExROT has performance intermediate to those of ROT class and STEPI class bandwidth estimators balancing accuracy and computational cost. So, the article recommands GCAk4ExROT as a replacement for NRROT and SROT. With computations no-longer remaining a major problem nowadays, STEPI may not required to be replaced whenever a precision in estimation is required and sufficient number of samples are available. Still one may select GCAk4ExROT over STEPI in case of small samples with slight loss of accuracy against the gain in computational time, the smoothness in estimation and the reduction in estimation variance. GCAExROT has an performance intermediate to those of ROT class and STEPI class bandwidth estimators for unimodal densities. In case, a prior is available that the density to be estimated is unimodal, GCAExROT instead of GCAk4ExROT can be used to replace NRROT.
The improvement needed for GCAExROT may raise a query whether we can consider the concept of applying extended assumption of near-Gaussianity for an unknown density a success or not. It is to be reminded that GCAExROT gave better accuracy for GGD(β) class of unimodal symmetrical densities with varying kurtosis. It has also shown success against skewed and other unimodal normal mixture densities. This implies that the extended assumption worked very well for KDE of the unimodal densities, which fit better the near-Gaussian assumption. But it is known that the real world densities may not be unimodal and thus may not exactly fit the near-Gaussianity assumption. The modifications needed was to make the near-Gaussian assumption valid in most of the real scenario and that is what it achieved. Thus, the extended assumption is really a success in bringing a class of rules balancing accuracy and computational cost.
The results specifically on real datasets raise a query: why the STEPI rule, following a more complex mechanism to estimate bandwidth, has resulted into such noisy estimates? The answer from (Loader 1999) is that the bandwidth estimators estimate optimal h that minimizes the AMISE and it is not optimized to achieve the smoothness. AMISE itself is inadequate to address the smoothness concern simultaneously. In spite of this justification, visual observation of smoothness can not be denounced.
The topic of data driven bandwidth estimator for KDE is well established. Till now, the research had been following two major trends. The first trend intends to give simple, 'quick and dirty' estimate of an unknown density. The intention is achieved by enforcing an assumption of a specific density on the unknown density to be estimated. The second trend intends to achieve more accurate estimate of an unknown density. The methods following this trend employ an iterative technique to solve a nonlinear equation relating density functionals and bandwidth, and improve the bandwidth parameter estimate iteratively. The methods apply assumption of a specific density on the unknown density to be estimated only during a pilot stage. As a whole, they become computationally expensive. The significance of this article can be derived from the fact that it successfully initiates a third trend. The third trend that balances both the previous trends in terms of the approach or the basic assumptions, which in turn also translates into balancing the accuracy and the computations. Instead of either an assumption of a specific density or effectively no such assumption through the multi-stage estimation approach, it assumes the unknown density to be estimated as the member of a generalized group of densities such that it can be expressed as an infinite series expansion through statistics of a known reference density. As these rules not just depend upon the scale parameter but also on the higher order cumulants of an unknown density, the class of these rules can also be identified as the 'Cumulant Measures'. With the better results due to wider assumptions and time taken almost of the same order, the rules through the third approach, the 'quick and not so dirty' rules, can replace the 'quick and dirty' rules.
By successfully deriving GCAExROT and modifications, this article also serves as a particular demonstration of a new class of bandwidth selection rules based on PDF approximations through infinite series. The PDF approximation through infinite series expansion is a well established area and there exist many such approximations based on various reference PDFs. To give some examples, there are Gram-Charlier series expansions based on the Poisson distribution (Aroian 1937), log-normal distribution (Aroian 1947;Schleher 1977), binomial distribution (Rietz 1927), gamma distribution (Bowers 1966), Wishart distribution (Kollo and von Rosen 1995) and others as reference PDFs. As the results achieved in this article are encouraging, many such rules can be developed and evaluated. ExROT based on pdf approximation through infinite series expansion other than GCA series, as well, multivariate generalization of the derived GCAExROT for multivariate KDE are the possible future work.

Appendix AMISE for bandwidth parameter selection in KDE
Given N realizations of an unknown PDF f (x), the kernel density estimatef (x) is given bŷ where, K(u) is the kernel function and h is the bandwidth parameter. Usually, K(u) is a symmetric, positive definite and bounded function; mostly a PDF; satisfying the following properties: The accuracy of a PDF estimation can be quantified by the available distance measures between PDFs, like L 1 norm or mean integrated absolute measure, L 2 norm or mean integrated square error (MISE), Kullback-Leibler divergence and others. The optimal smoothing parameter (the bandwidth) h is obtained by minimizing the selected distance measure. The bandwidth selection rule based on the most widely used criteria MISE or ISE (Integrated Square Error) is derived as under (Silverman 1986;Wand and Jones 1994).
Expanding f (x − hz) using Taylor Series with the assumption of f (x) being a smooth PDF, i.e., all derivatives exist, we get: ⇒ Bias(f (x)) ≈ h 2 2 µ 2 (K)f (x), where µ 2 (K) = z 2 K 2 (z)dz, the second order central moment of the kernel.
Similarly, Var(f (x)) = Var where, s is the mean of x and z = x−s h . Now, expanding f (x − hz) using Taylor Series with assumption of f (x) being a smooth PDF i.e. all derivatives exist Combining equations (25), (26) and (28) MISE(f (x)) = h 4 4 (µ 2 (K)) 2 R(f ) where R(f ) = (f (x)) 2 dx and R(K) = K 2 (z)dz. In general, R(g) = g 2 (z)dz is identified as the roughness of function g(x). An asymptotic large sample approximation AMISE is obtained, assuming lim N →∞ h = 0 and lim N →∞ N h = ∞ i.e. h reduces to 0 at a rate slower than 1/N .
The Equation (29) interprets that a small h increases estimation variance, whereas, a larger h increases estimation bias. An optimal h minimizes the total AMISE(f (x)). So, Thus, the optimal bandwidth parameter depends upon some of the kernel parameters, number of samples and the second derivative of the actual PDF being estimated.