Estimation of Finite Population Ratio When Other Auxiliary Variables are Available in the Study

The estimation of the population total ty, by using one or more auxiliary variables, and the population ratio θxy = ty/tx, tx is the population total for the variable X, for a finite population are heavily discussed in the literature. In this paper, the idea of estimating the finite population ratio θxy is extended to use the availability of auxiliary variable Z in the study. The availability of such variable can be used to increase the precision of estimating the population ratio θxy. Our idea is supported by the fact that the variable Z may be more correlated with the variable Y than the correlation between the variables X and Y. To our knowledge, this idea is not discussed in the literature, and may be extended to use the availability of p auxiliary variables. The bias, variance and the mean squares error are given for our approach. Simulation from real data set, the empirical relative bias and the empirical relative mean squares error are computed for our approach and for different estimators proposed in the literature for estimating the population ratio θxy. Analytically and the simulation results show that, by suitable choices, our approach gives negligible bias and has less mean squares error. Further, under simple random sampling without replacement, the population variances of the estimators that are used in this paper are computed. Based on the random samples, that are used for estimating the population ratio θxy, the sample variances for the different estimators that are used in our approach are compared with the population variances for each estimators i.e. the empirical mean, the empirical relative bias, and the empirical relative mean squares error for the sample variances are reported. As a result of this simulation study, our approach is more efficient than other estimators proposed in the literature.


Introduction
Consider a finite population U of N units indexed by the set {1, 2, • • • , N }.For the ith unit, let y i , and x i be the values of the variables Y and X, respectively.One of the main interest in survey sampling is to estimate the population ratio θ yx = t y /t x , where t y = i∈U y i be the population total for the variable Y , and t x = i∈U x i be the population total for the variable X.In the literature, there are different ideas for estimating the population ratio θ yx .To our knowledge, none of them used the availability of another auxiliary variable Z in the study.
The availability of such auxiliary variable can be used to improve the precision of estimating θ yx .Our idea is to use the auxiliary variable Z to improve the precision of the estimator of θ yx .
Under simple random sampling without replacement (srs) design, Hartley and Ross (1954) proposed an exactly unbiased estimator for θ yx .The proposed estimator is given by θHR = rs + n (N − 1) where, ȳs = i∈s y i /n, rs = i∈s r i /n, r i = y i /x i , xs = i∈s x i /n, and xu = t x /N.This estimator can be rewritten under general sampling design p (•).In this case, this estimator is no longer unbiased but still with negligible bias (Al-Jararha 2012).
Under general sampling design, Al-Jararha and Al-Haj Ebrahem (2012) proposed an estimator for estimating the population ratio θ yx .This estimator, has negligible relative bias especially for small sample sizes n and approaches zero with increasing n.Under srs, and based on simulation results, the performance of this estimator is better than Hartley and Ross (1954) estimator.Their estimator is defined by θJM = rs + 1 xu (ȳ s − rs xs ) . (2) Under General sampling design, Al-Jararha (2012) obtained an exactly unbiased estimator for the population ratio θ.This estimator, under srs design, gives the Hartley and Ross (1954) estimator.Further, the variance and an unbiased estimator of the variance of such estimator were obtained.This estimator, works well in stratified sampling designs.
Define π i , the first order inclusion probability, by π i = P r i th element ∈ s = s i p (s) .
For i = j, the second order inclusion probability is defined by π ij = P r i th and j th elements ∈ s = s i, j p (s) .
The Horvitz and Thompson (1952) estimator of the population total t y = i∈U y i is defined by where I {i∈s} is one if i ∈ s and zero otherwise.Further, ȳs = 1 N tyπ , can be used to estimate the population mean ȳu = 1 N t y .It can be noted that tyπ and ȳs are unbiased estimators for t y , and ȳu respectively.However, tyπ and ȳs do not use the availability of auxiliary variables in the study.In similar way, xs = 1 N txπ , and rs = 1 N trπ are unbiased estimators for xu and rU respectively.
The availability of more than one auxiliary variable is used in literature for estimating the finite population total t y , or finite population mean y u .
Under srs, Olkin (1958) was the first one who deals with the problem of estimating the population mean using more than one auxiliary variables.His estimator is given by ŷu where p is the number of the auxiliary variables, θyx i = ȳs /x is , w i is the weight of the ith auxiliary variable such that p i=1 w i = 1, ȳs is the sample mean of Y and xiu , xis are the population mean and the sample mean of X i , respectively, for i = 1, . . ., p. Singh and Chaudhary (1986) proposed the following estimator for estimating the population mean y u , where w 1 + w 2 = 1.
Abu-Dayyeh, Ahmad, Ahmad, and Hassen (2003) studied the general form of Singh and Chaudhary (1986) estimator.They proposed two classes of estimators using two auxiliary variables to estimate the population mean for the variable of interest Y. Kadilar and Cingi (2004) suggested a new multivariate ratio estimator using the regression estimator instead of ȳs which used in Singh and Chaudhary (1986) estimator.Their estimator is given by ȳpr = where b i , i = 1, 2 are the regression coefficients.Based on the mean squares error (M SE), they found that their estimator is more efficient than Singh and Chaudhary (1986) estimator when M SE (ȳ pr ) < M SE (ȳ u ) , where M SE (ȳ pr ) , and M SE (ȳ u ) are defined by Equations (2.4), and (1.2) of Kadilar and Cingi (2004), respectively.
Other authors are using different ideas for estimating the population mean ȳu .On the other side, there are different ideas for estimating θ yx , to our knowledge, none of them discussed the idea of using the availability of other auxiliary variable Z for estimating the population ratio θ yx .In this article, under general sampling design, a family of estimators is adopted for estimating the population ratio θ yx .For such family, the bias, variance, MSE are given.
Based on simulation from real data set, we will compare between given estimators for θ yx , proposed in the literature and our approach.

Proposed Family
The existence of one or more auxiliary variables can be used to improve the estimate of θ yx .
In our approach, for the ith unit, let y i , x i and z i be the values of the variable of interest Y, and the auxiliary variables X, and Z respectively.Our goal is to estimate the population ratio θ yx = t y /t x when the auxiliary variable Z is available in the study.
Our approach is summarized by rewriting the definition of θ yx as for given λ and θ zx = t z /t x .Usually, t x and t z are assumed to be known; therefore, we assume θ zx to be known.Based on this, estimate θ yx by Remark 2.1.The estimators θyx , and θyz can be computed from proposed estimators for the population ratio in the literature.Both, θyx , θyz can be computed from the same estimator of the population ratio, or from different estimators.
From Equation(4), take the expectation of θyx , we have Therefore, Remark 2.2.From Equation(6), θyx is unbiased or asymptotically unbiased is achieved by choosing θyx , and θyz to be unbiased or asymptotically unbiased.
From Equation(4), the variance of θyx is From Equations (6), and ( 7), the MSE of θyx is Assume that θyx to be unbiased or asymptotically unbiased, by choosing θyx , and θyz to be unbiased or asymptotically unbiased.In this case, M SE θyx = var θyx .The optimal value of λ, can be obtained by differentiating the right hand side of Equation( 8) with respect to λ, equate to zero, and solve for λ we have where From Equations ( 4) and ( 9) the optimal estimator for θ yx is Remark 2.3.In general, the transformation given by Equation ( 11) is not a convex transformation.However, the transformation is a convex transformation when 0 ≤ λ opt ≤ 1, this condition holds if λ * ≥ 0. In this case, the numerator and the denominator of λ * should be positive; equivalently, from Equation (10), if where ρ θyx , θyz is the correlation between θyx and θyz .
In real applications, λ opt is unknown; however, λ opt can be estimated from random sample.Under general sampling design p (•), draw the random sample S, estimate λ opt by λopt = 1 where From Equation ( 11), θyx is computed from In the next section, we describe how we can apply our approach.In most applicable cases, t x and t z are known from previous studies or from a pilot study.However, the worst scenario happens when θ zx = t z /t x is unknown.In this case, estimate θ yx by where θzx is an estimate for θ zx .Our goal is to find the bias, variance, and the MSE of θyx .
As it is clear from Equation ( 15), θyx is not a linear function in θzx , and θyz , and to avoid the 3rd and 4th order inclusion probabilities, to first order and by using Taylor expansion, expand the right hand side of Equation( 15), we have Remark 2.4.The first order linearization is widely used in survey practice, but that in general it is very difficult to evaluate the quality of approximation analytically.Therefore, simulations are presented that show reasonable results at least in the particular case described.
From Equation( 16), the bias of θyx is bias θyx The variance of θyx is From Equations ( 17) and ( 18), the MSE of θyx is Remark 2.5.From the right hand side of Equation( 17), it is clear that the need of using unbiased or asymptotically unbiased estimators for estimating θ yx , θ zx , and θ yz .In this case, bias θyx is zero or asymptotically zero i.e. θyx is unbiased or asymptotically unbiased estimator for θ yx .As a result of this, Under the assumption θyx , θzx , and θyz are unbiased (or asymptotically unbiased) estimator for θ yx , θ zx , and θ yz , respectively.The optimum value of λ which is minimizing the right hand side of Equation( 19) is In real application, the first case, θ zx = t z /t x is known, is more applicable than the second case, θ zx = t z /t x is unknown.Therefore, in the next section, we will describe how we can apply the first approach.However, the second approach can be used in similar way as the first one.

Applying Our Approach
In this section, we will apply the first case, θ zx = t z /t x is known.However, the second approach, θ zx = t z /t x is unknown, can be used in similar way as the first one.Based on Remark(2.2),we restrict ourselves to the estimation of θ yx , and θ yz , by unbiased or asymptotically unbiased estimators from the literature.In this paper, we will use the classical ratio estimator, and the estimators given by Equations ( 1) and (2).

Classical Ratio Estimator
In this subsection, we will compute θyx and θyz from the usual classical ratio estimator, i.e. θyx , and θyz are computed from θyx = tyπ txπ ( 24) respectively.In this case, respectively.Where and wi = y i − θyz z i /N zu . (30) For more details, see Al-Jararha and Al-Haj Ebrahem (2012).

Hartley and Ross Estimator
Under srs sampling design, Hartley and Ross (1954) proposed an exactly an unbiased estimator for estimating the population ratio.This estimator can be rewritten under general sampling design (Al-Jararha 2012).In this case, θyx and θyz are computed from and respectively.To compute var θyx , var θyz , and cov θyx , θyz reuse Equations ( 26), (27), and (28) but with the following definitions ŵi = n (N − 1) For more details, see Al-Jararha and Al-Haj Ebrahem (2012).

Al-Jararha and Al-Haj Ebrahem Estimator
Under general sampling design, Al-Jararha and Al-Haj Ebrahem ( 2012) proposed an asymptotic unbiased estimator for estimating the population ratio.This estimator is working better than Hartley and Ross (1954).In this case, θyx and θyz are computed from and respectively.To compute var θyx , var θyz , and cov θyx , θyz reuse Equations ( 26), ( 27), and (28) but with the following definitions where θ is another estimator for θ.
From Equation ( 14), recall our approach, to make the notations clear, consider the following θyx is computed θyz is computed θyz is computed θyz is computed from from Eq(25) from Eq(32) from Eq(36) group I Eq( 24 In order to use Equation ( 42), and for the ith group, compute EMSE θ for the estimators in this group and the EMSE θ for its corresponding group.
From the described population, simulate m = 3, 000 samples under different sampling designs i.e. srs, πps, and stratified sampling design, when the sample size n = 20, 30, 40, 50 and 60.Sampling from the population will be achieved by using procedure surveyselect of SAS Institute, and the computations are computed by using a macro written in SAS.For a given sample of size n, and based on each sample, compute the estimators θyx , and θww , w = R, H, J, as they described above.

Variance Estimation of the θyx
In this section, under srs, our main goal is to compute the population variances for the 12 estimators described in the Subsection (4.1).Further, we will compute the empirical sample mean, relative bias, and the MSE for the sample variances computed from the random samples simulated in the Subsection (4.1).
Recall that tyπ = i∈U y i I {i∈s} π i , the Horvitz and Thompson (1952) estimator of the population total t y = i∈U y i .Under srs (Särndal, Swensson, and Wretman 1992), and where and f = n/N.Similarly, the covariance between tyπ and tzπ is computed from which is estimated by where and Remark 4.1.Since the 12 estimators discussed in the Subsection (4.1) are linearized to first order Taylor expansion (Al-Jararha and Al-Haj Ebrahem 2012), Equations ( 44)-( 47) are ready to be used for such estimators.The computations in this part are similar to the computations as in Subsection (4.1), but for variances.
The empirical mean (MV) of the var srs θ of var srs θ is where var srs θ where var srs θ is another estimator for var srs θ .
Under srs, population variances are computed for every estimator mentioned in Subsection (4.1).Further, based on every simulated sample used for estimating such estimators is also used to compute the sample variances for the 12 estimators.Results are reported in Table ( 5).
This Subsection is restricted to srs sampling design since there are difficulties to use other sampling designs.For example, under πps, procedure surveyselect gives the first and second order inclusion probabilities for the sample only.Even though, the computations under srs are not an easy task!

Results and Conclusions
The , are used to estimate θ yx based on our approach i.e. the estimators θyx.wv , for w, v = R, H, J, are using the availability of another auxiliary variable Z in the study .However, the three estimators, θww , for w = R, H, J, are not using the availability of Z.
From Tables (1), ( 2), (3), and (4), we can conclude the following: 1.The nine estimators, θyx.wv , for w, v = R, H, J, have negligible empirical relative biased regardless the sample size n, and the group.This comes from the behavior of the estimators that are used in each group described above.In general, from Equation ( 6), the bias of θyx depends on the behavior of θyx and θyz ; the estimators θyx and θyz must be unbiased or asymptotically unbiased for θ yx and θ yz , respectively.
2. The use of the estimators, θyx.wv , for w, v = R, H, J, perform much better than the estimators θww , for w = R, H, J, from the empirical relative mean squares error point of view.In other words, the availability of auxiliary variable can be used to improve the precision of the estimation the population ratio θ xy .
Population variances , the empirical means , relative bias, and relative mean squares error of the sample variances for the estimators discussed in the Subsection (4.1) are reported in Table ( 5).From this Table , we can see that all the discussed estimators have negligible relative biased.Further, in the meaning of the relative efficiency , the estimators based on our approach, θyx.wv , for w, v = R, H, J, are more efficient than the proposed estimators θww , for w = R, H, J.These results are true regardless the sample size n.
The absolute differences between the EV from As a final remark, our approach can be adopted if we carefully choose the estimators θyx and θyz to be unbiased or asymptotically unbiased for θ yx and θ yz , respectively.In this case, our approach can be used to improve the precision of the estimation the population ratio θ xy .Further, in similar steps our ideas can be extended to use more than one auxiliary variable.
group I, compute θRR from Equation(24), for group II, compute θHH from Equation(31), and for group III, compute θJJ from Equation(35).We can see that the computation of θRR , θHH , and θJJ depend on the variable of interest Y and the auxiliary variable X only.
nine estimators, θyx.RR , θyx.RH , θyx.RJ Table(1), and the MV from Table(5) are summarized in Table (6).From Table (6), we can see that all the absolute differences are negligible regardless the sample size.

Table 3 :
Stratified sampling design: Under srs, draw random sample of size n h from each stratum and combined samples into one sample of size n.

Table 4 :
Stratified sampling design: Under πps, draw random sample of size n h from each stratum and combined samples into one sample of size n.

Table 5 :
Under srs: Comparisons between the variances for the different estimators.var := population variance of the estimator.

Table 6 :
Under srs: Numbers in this Table are the absolute differences between EV, Table(1), and MV, Table(5).