An Efficient Estimation Method Based on Double Phase Sampling

Abstract: In this paper an estimation method based on double phase sampling is proposed to improve the efficiency of estimating the population mean. An extension is presented for the bivariate case to estimate the parameters of the simple linear regression model. Conclusions of this study show that using the proposed method with symmetric populations, the estimator of the population mean is unbiased and more efficient than the traditional one that is based on a simple random sample. Results for the standard uniform and the exponential distribution are given. Simulation results show that the proposed method is also more efficient than the traditional one in case of estimating the regression parameters. An application to a real data set is also given.


Introduction
Numerous procedures have been proposed for increasing the precision of parameters estimation.One of such procedures is to use a double phase sampling theory which has been around for a long time (Deming, 1953).In this paper, we propose a modified Theil's type nonparametric method that is based on a double phase sampling, to estimate the population mean and the parameters of the simple linear regression model.The proposed method is an extension of the simple AM sampling procedure that was introduced by the authors of this paper (Al-Nasser and Al-Haj Ebrahem, 2005;Al-Haj Ebrahem and Al-Nasser, 2005).The simple AM method can be described as follows: 1. Arrange the observations in ascending order on the basis of the values of the x i 's, i.e. x (1:n) ≤ x (2:n) ≤ • • • ≤ x (n:n) and the associated y [1] , y [2] , . . ., y [n] of the original data are taken.Thus the new pairs will be (x (i:n) , y [i] ), i = 1, . . ., n.Note that x (i:n) is the i-th ordered observation from a sample of size n.
Austrian Journal of Statistics, Vol. 36 (2007), No. 4, 319-328 2. Divide the ordered data into m-subgroups each of size r, such that mr = n.The sample can be rewritten as ) .
Note that m can be chosen to be the maximum divisor of n such that m ≤ r.

Find all possible paired slopes
4. The estimator of the slope can be defined as Based on a simulation study, the authors had demonstrated that the simple AM method could be considered as a good alternative to the traditional methods, because it was able to produce satisfactory results.
The paper is organized as follows.Estimation of the population mean using an extension of the AM method and results from the standard uniform distribution and exponential distribution are presented in Section 2. The proposed sampling plan to estimate the regression coefficients is described in Section 3. Application to a real data set is given in Section 4. Simulation study and conclusions are discussed in Section 5. Concluding remarks are given in Section 6.

Sampling Procedure and Estimating the Mean
An extension of the AM method consists of two phases.In the first phase we draw a random sample of size r 2 from the population of interest, and then a sample of size only r is selected from those r 2 units.In the second phase we repeat the first phase m times such that rm = n, where n is the final sample size.The procedure can be described as: 1. Select a random sample of size r 2 from the population of interest.2. Arrange the r 2 units in ascending order and then divide the ordered sample into r sets of r units each.3. Choose a sample of size r for the actual analysis.This sample consists of the smallest ranked unit from the first set, the second smallest ranked unit from the second set, continuing until the largest ranked unit is selected from the last set.The chosen sample of size r is then 4. Repeat steps 1 to 3 m times (cycles) until the desired sample size n = mr is obtained.The final sample will be where x (i:r 2 )j represents the i-th order observation in the j-th column, i = 1, r + 2, . . ., r 2 , j = 1, 2, . . ., m, and each column represents the r selected units obtained by repeating steps 1 to 3. Note that units in the same row of the final sample are independent and identically distributed.
Using this sampling procedure the estimator of the population mean µ based on a sample of size n = mr is given as Its expected value is where the probability density function of Hence, Its variance is For any distribution that is symmetric around zero, (Balakrishnan and Cohen, 1990).Thus, this estimator is unbiased for such distributions.Moreover, the relative efficiency of this estimator with respect to the traditional simple random sample (SRS) estimator is , where XSRS = n i=1 x i /n, and σ 2 denotes the variance of the population.

Results for the Standard Uniform Distribution
Suppose that we select a random sample of size n = mr from the standard uniform distribution U(0, 1).It is easy to verify that Equality holds for r = 1, which implies that the extension of the AM method gives a more efficient and an unbiased estimator of the uniform population mean compared to a SRS estimator.Clearly from Figure 1, the relative efficiency of the estimator increases as r increases.
Figure 1: The relative efficiency of the estimator as a function of r.

Results for Exponential Distribution
Consider the case of selecting random samples from an exponential distribution with mean 1.The mean and the variance of the proposed estimator will be Since XAM is biased for skewed distributions, we consider var( XSRS )/MSE( XAM ), with mean squared error MSE( XAM ) = var( XAM ) + E( XAM ) − 1 2 .From Figure 2 we clearly see that the efficiency of the estimator slowly increases as r increases.

Estimating Regression Coefficients
The procedure consists of ordering the pairs (x i , y i ), i = 1, . . ., n, by the magnitude of the x i 's and splitting the observations into some sets.This can be described as follows: 1. Select a random sample of size r 2 from the population.
In order to illustrate the efficiency of this method for the bivariate case, we estimate the parameters of the simple linear regression model where Y is the response variable, the intercept α and the slope β are unknown parameters, X is the predictor variable and is a random error term assumed to have zero mean and variance σ 2 .
Let αSRS and βSRS denote the least squares estimators of α and β, respectively, obtained considering a SRS, i.e.
, where xSRS = n i=1 x i /n and ȳSRS = n i=1 y i /n.Similarly, the least squares estimators of α and β obtained regarding the proposed method of a sample of size n = rm, respectively, are It can be shown that standard errors of these estimates are the square roots of

Real Data Application
The following example illustrates the extension of the AM method for the bivariate case.We have used the so-called car data from Graybill and Iyer (1994).Twenty cars were selected using the SRS and the AM procedure and shown in Tables 1 and 2, respectively.The response variable represents the first year maintenance costs and the explanatory variable represents the number of miles driven during the first year after purchase.Based on these two samples estimates of the simple linear regression parameters and their standard errors are calculated and given in Table 3.
Clearly, from Table 3 the proposed method has an overall smaller standard error than the traditional SRS.Moreover, Figure 3 shows the residuals and the predicted values for both estimation methods.It can be noted that both estimation methods give very similar values of the residuals and behave in a similar way.

Simulation Study
In order to compare the performance of the estimates of α and β obtained using an extension of the AM method and a SRS, a simulation study is conducted.Define the relative efficiency of αAM with respect to αSRS as Eff(α) = MSE(α SRS )/MSE(α AM ), where MSE(α SRS ) and MSE(α AM ) are the mean squared error of αSRS and αAM , respectively.Similarly we define the efficiency Eff( β).We simulate data for the model (1) with α = β = 1, i.e. y i = 1 + x i + i , where x i ∼ N (0, 1), and we consider different distributions for the error term: 1. Symmetric around zero with different scales, i ∼ N (0, 1) and i ∼ N (0, 4).
Simulation results are provided in Tables 4 to 11.Values of the biases are in Tables 4,  6, 8, and 10, while those for the MSE's are in Tables 5, 7, 9, and 11.From these tables we conclude that the AM estimates behave very well for different types of error distributions.More specifically we see that the biases and mean square errors of the estimated parameters decreases as the sample size increases and the proposed AM estimates have smaller biases compared with the traditional SRS estimates.Calculating efficiencies shows that the proposed AM estimates are more efficient than the SRS estimates.We also see that the AM method is more efficient in estimating the slope than in estimating the intercept.
To estimate the mean of a symmetric distribution the analytical results show that the AM method often generates an unbiased estimator.As an example, the relative efficiencies in Figures 1 and 2 for the standard uniform and the exponential distribution show that the AM estimator is preferable over the traditional estimator based on a SRS.

Concluding Remarks
As a summary, simulation results demonstrate that the AM estimates are superior and often closer to the true parameter than the traditional estimates based on a SRS.For all situations used in the simulation study by considering different types of the error term distributions, the AM estimates are more efficient than the estimates based on a SRS.Consequently, the AM estimator can be recommended for estimating the mean of symmetric distributions and the parameters of regression model.

Figure 2 :
Figure 2: The efficiency of the estimator as a function of r for the exponential distribution.

Figure 3 :
Figure 3: Residuals and predicted values for both estimation methods.

Table 2 :
Data using the AM procedure with r = 5.