A Ratio Estimator Under General Sampling Design

Recently, many authors introduced ratio-type estimators for estimating the mean, or the ratio, for a finite populations. Most of the articles are discussing this problem under simple random sampling design, with more assumptions on the auxiliary variable such as the coefficient of variation, and kurtosis are assumed to be known. Gupta and Shabbir (2008) have suggested an alternative form of ratio-type estimators and they assumed the coefficient of variation of the auxiliary variable must be known; this assumption is crucial for this estimator. An estimator of the population ratio, under general sampling design, is proposed. Further, exact and an unbiased variance estimator of this estimator are obtained, and the Godambe-Joshi lower bound is asymptotically attainable for this estimator. The assumption on the coefficient of variation of the auxiliary variable is not needed for the proposed estimator. Simulation results from real data set and simulations from artificial population, show that the performance of the proposed estimator is better than Gupta and Shabbir (2008) and Hartley and Ross (1954) estimators. Zusammenfassung: Jüngst führten einige Autoren quotientenartige Schätzer ein, um den Erwartungswert oder einen Quotienten davon für eine endliche Stichprobe zu schätzen. In den meisten dieser Artikeln wird dieses Problem unter einem einfachen Zufallsstichproben Design diskutiert mit weiteren Annahmen bezüglich der Hilfsvariablen wie die Bekanntheit deren Variationskoeffizienten und Kurtosis. Gupta and Shabbir (2008) schlugen eine alternative Form von quotientenartige Schätzer vor, und sie nahmen dazu an, dass Variationskoeffizient der Hilfsvariablen bekannt sei; derarige Annahmen sind für diesen Schätzer is kritisch. Ein Schätzer des Populationsverhältnisses unter einem allgemeinen Stichprobendesign wird vorgeschlagen. Weiters werden ein exakter und ein unverzerrter Varianzschätzer dieses Schätzers erhalten, und die Godambe-Joshi untere Schranke ist asymptotisch erreichbar für diesen Schätzer. Die Annahme bezüglich des Variationskoeffizienten der Hilfsvariablen wird für den vorgeschlagenen Schätzer nicht benötigt. Simulationsergebnisse von realen Datensätzen und Simulationen von künstlich generierten Daten zeigen, dass die Eigenschaften des vorgeschlagenen Schätzers besser sind als die der Gupta and Shabbir (2008) und der Hartley and Ross (1954) Schätzer.


Introduction
Consider a finite population U of units {1, . . ., N }.For the ith unit, let y i and x i be the values of the variable of interest and the auxiliary variable respectively.One of the interest is to estimate the population ratio θ = t y /t x , where t y = ∑ i∈U y i , the population total for the variable of interest, and t x = ∑ i∈U x i , the population total for the auxiliary variable.Another interest is estimate the population total, t y , by θ • t x , where t x is assumed to be known, and θ is an estimator of θ.
As it well known that Hartley and Ross (1954) estimator is an unbiased estimator under simple random sampling (srs) design without replacement for estimating the population ratio θ.Under general sampling design, Al-Jararha (2008) obtained an exactly unbiased estimator for the population ratio θ, this estimator gives the Hartley and Ross (1954) estimator under srs design.Further, the variance and unbiased estimator of the variance of such estimator were obtained.This estimator, also works well in stratified sampling designs.Gupta and Shabbir (2008) showed that, under srs their estimator gives better results than the estimators given by Kadilar and Cingi (2004), Kadilar and Cingi (2006a), Kadilar and Cingi (2006b), Singh and Tailor (2003) and the regression estimator.
In this article, we will propose an estimator for the population ratio, θ, under general sampling design.Through simulations from real data set and under srs design, we will compare the proposed estimator with the ratio estimators obtained by Gupta and Shabbir (2008) and Hartley and Ross (1954).Further, Hartley and Ross (1954) will be written under general sampling design and we will compare this form with the proposed estimator under proportional to size design.
Based on a measurable sampling design p(•), draw a random sample s from U .An auxiliary variate x i , correlated with y i , is obtained for each unit in the sample s.Define π i , the first order inclusion probability, by The Horvitz and Thompson (1952) estimator of the population total t y = ∑ i∈U y i is defined by where I {i∈s} is one if i ∈ s and zero otherwise.It is an easy task to show that tyπ is an unbiased estimator for t y .Further, ȳs = 1 N tyπ can be used to estimate the population mean ȳU = t y /N .

The Hartley and Ross Estimator
Under srs, Hartley and Ross (1954) have proposed the following estimator to estimate the population ratio θ, where This estimator can be extended to be used under general sampling design p(•) by redefining To find an approximate variance and an estimate for the approximate variance, by using Taylor expansion to first order, expand the righthand side of equation ( 1) we have where Take the variance of both sides of equation (2), we have Therefore, an unbiased estimator for var( θHR ) is and π ij is the second order inclusion probability.

The Gupta and Shabbir Estimator
Under srs design, Gupta and Shabbir (2008) have proposed the estimator to estimate the population mean ȳU , where w 1 and w 2 are weights and η ̸ = 0 and λ are either constants or functions of the known parameters such as standard deviation, variance, etc.The bias and the mean squares error (MSE), as corrected by Koyuncu and Kadilar (2010), of ȳGS are bias(ȳ and The optimum values of w 1 and w 2 , which minimize the MSE, are given by and Therefore, the optimum MSE of ȳGS is where ηx U +λ , C y is the coefficient of variation of y, ρ yx is the correlation coefficient between y and x, which can be estimated from the sample, C x is the coefficient of variation of x is assumed to be known, Since our goal is to estimate the population ratio θ, divide equation (3) by xU , we have with MSE

The Proposed Estimator
Assume that x i > 0 for all i = 1, . . ., N and xU is known.Under general sampling design, p(•), the following estimator is proposed Remark 2.1 θP is not the Hartley and Ross (1954) estimator especially for small sample size n.
By using the Taylor expansion, expand θP to first order, we have Hence, E p ( θP ) = ȳU /x U = θ, i.e. to first order, θP is an unbiased estimator for θ.From equation ( 8) rewrite θP as where Z i = y i − rU x i .This variance can be estimated by where Ẑi = y i − rs x i .
Remark 2.2 Under mild conditions, asymptotic results for θP can be established.As an example, a central limit theorem for θP can be established.Under srs, it can be shown that avar is the asymptotic variance of θP and is a consistent estimator for avar srs ( θP ).Therefore, Austrian Journal of Statistics, Vol. 41 (2012), No. 2, 105-115 Now, consider the model ξ : y i = βx i + varepsilon i , where ε i are independent with mean zero and variance σ 2 i .Let θ be any estimator for the population ratio θ, the estimation error θ − θ can be examined, jointly under the model, ξ, and the sampling design, p(•).The anticipated variance (Särndal, Swensson, and Wretman, 1992) The Godambe-Joshi lower bound (Godambe and Joshi, 1965) is defined by are independent with mean zero and variance Hence, we can show that ) .
Therefore, the GJLB is asymptotically attainable for θP .

Simulation Studies and Conclusions
Consider the real data set, USPOP: a summary of the United States population from the 2000 Census.This data is obtained from Scheaffer, Mendenhall, and Ott (2006).The percent in poverty for US was 11.9 %, as reported in the data set or as computed from the data.In this section, our main goal is to estimate this number based on different estimators.
The variables of our interest are X := Total: total resident population for each state in US, and Percent in Poverty: percentage of the population estimated to live with income under the poverty line.To produce the variable Y := number of resident with income under the poverty line, multiply the variable Total by the variable Percent in Poverty.Under srs, we will compare the three estimators, namely Hartley and Ross (1954), different versions of Gupta and Shabbir (2008), and the proposed estimator which is given by equation ( 7).As suggested by Koyuncu and Kadilar (2010), in equation ( 6), consider the following choices of η and λ: 1) θGS( 2) θGS(3) θGS(4) θGS( 5) Here λ = β 2(x) is the kurtosis of the auxiliary variable X.From the data USPOP, under srs, draw a random sample of size n by using procedure surveyselect of SAS Institute.Our purpose is to estimate the percent in poverty θ = 11.9%.
Consider an artificial population of N = 200 units.For i = 1, . . ., 200, simulate x i from exp(1) and independently from the random error, ε i .For given x i , define y i = 8x i + ε i .We will simulate ε i from N (0, x i ) and another case from N (0, x 2 i ).When ε i ∼ N (0, x i ), we have t x = 190.7164,t y = 1508.4788,and θ = 7.9095; further, when ε i ∼ N (0, x 2 i ), we have t x = 190.7164,t y = 1492.4845,and θ = 7.8257.Define the following: the empirical mean of the estimator θ is defined by where θ(k) is the estimate of θ based on the kth simulation.The empirical relative bias (ERB) of θ is defined by The empirical mean squares error of θ is defined by and the empirical relative mean squares error (ERMSE) of the estimator θ to the EMSE of the estimator θP is defined by From the described populations, under srs sampling design and by using procedure surveyselect of the SAS Institute, simulate 1500 samples when the sample size n = 2, 5, 10, 15, 20, 25.For a given sample size n, and based on each sample, estimate θ by using θHR , θGS(i) , i = 0, . . ., 5, and θP .Further, compute EM, ERB, and ERMSE as defined by equations ( 12), (13), and (15), respectively.Results are given in Tables 1, 2, and 3.
It is not an easy task to extend θGS to be used under general sampling design.However, the proposed estimator θP can be used under a general sampling design.Further, the  estimator θHR can be used under general sampling and this can be done by using equation (1) with suggested extensions.Therefore, we will compare the two estimator θP and θHR under proportional to size and without replacement (πps) sampling design.
For the USPOP population, consider the variable X := Total as the size variable.Under πps, draw a random sample of size n = 2, 4, 6, 8 by using procedure surveyselect of the SAS Institute.With the same number of simulations (i.e.1500) and from each simulation, estimate θ = 11.9% by θHR and by θP .Based on 1500 simulations, compute EM, ERB, and ERMSE.Due to the sampling limitation (the relative size of each sampling unit should not exceed 1/n), we can not take n greater than 8. Further, repeat the same ideas for the artificial population when X is the size variable.The results are summarized in Tables 4, 5, and 6.

Results and Conclusions
From Tables 1, 2, and 3, we can conclude the following: • The proposed estimator θP has a negligible relative bias, especially for small values of n and approaches zero with increasing n.
• For all values of n, θP has lowest empirical relative mean squares error (ERMSE) compared with other estimators.Further, ERMSE( θP ) and ERMSE( θHR ) are approximately the same for large sample size n.• The assumption that the coefficient of variation for the auxiliary variable C x plus other conditions are crucial for θGS and can give worst results if C x is estimated from samples especially for small values of n.C x is computed from the population in our calculations.
From Tables 4, 5, and 6 we notice that the two estimators have a negligible relative bias.However, the proposed estimator θP do much better than the θHR estimator in term of ERMSE for n = 2, 4, 6, 8.
The Natural Resources Inventory (NRI) is a real survey conducted by the US Department of Agriculture's Natural Resources Conservation Service (NRCS), in cooperation with Iowa State University's Center for Survey Statistics and Methodology.The sample design is based on a stratified two stage area sample of all US lands (http:// www.nrcs.usda.gov/).In stratified sampling design, usually we are drawing a small sample size (NRI as an example).In such situations, one can apply θP to each strata since the estimator θP has negligible relative bias and has the smallest empirical relative mean squares error among all other estimators discussed in this paper.
From the above discussions, we can conclude that the estimator θP can be used under general sampling design and has the smallest empirical relative mean squares error among all other estimators discussed in this paper especially when the sample size is small.Since

Table 4 :
US population.Comparison between θP and θHR under πps sampling design and based on 1500 simulations.

Table 5 :
Artificial population.Comparison between θP and θHR under πps sampling design and based on 1500 simulations when ε i ∼ N (0, x i ).

Table 6 :
Artificial population.Comparison between θP and θHR under πps sampling design and based on 1500 simulations when ε i ∼ N (0, x 2 i ).