Simulation Studies for Complex Sampling Designs

Choosing the appropriate variance estimation method in complex surveys is a difficult task since there exist a variety of techniques which usually cannot be compared mathematically. A relatively easy way to accomplish such a comparison is on the basis of simulation studies. Though simulation studies are widely used in statistics, they are not a standard tool for investigating properties of estimators in complex survey sampling designs. In this paper we describe the setup for a simulation study according to the sampling plan of the Austrian Microcensus (AMC), used 1994–2003 which is an example for a very complex sampling plan. To illustrate the proceeding we conducted a simulation study comparing basic variance estimators. Results of the study reveal the extent to which simple variance estimators may underestimate the true sampling error in close to reality situations. Zusammenfassung: Die Wahl geeigneter Methoden zur Varianzschätzung in Erhebungen mit komplexen Stichprobenplänen ist schwierig, da es eine Reihe verschiedener Verfahren gibt, die i.a. nicht theoretisch vergleichbar sind. Relativ einfach kann jedoch ein derartiger Vergleich auf Basis von Simulationsstudien durchgeführt werden. Diese werden zwar in der Statistik häufig eingesetzt, für komplexe Stichprobenerhebungen war das jedoch bislang noch nicht der Fall. In diesem Artikel beschreiben wir ein Setup für Simulationsstudien am Beispiel des überaus komplexen Stichprobenplanes, der für den österreichischen Mikrozensus 1994–2003 verwendet wurde. Zur Illustration des Vorgehens dient eine Simulationstudie, in der einfache Varianzschätzer verglichen werden. Ihre Ergebnisse zeigen, in welchem Ausmaß einfache Varianzschätzer den wahren Stichprobenfehler in der Realität nahekommenden Situationen unterschätzen können.


Introduction
Variance estimation for complex surveys is a challenging problem.Although a variety of techniques for variance estimation exists -see e.g.Wolter (1985) -theoretical comparisons of the properties of different estimators are at most feasible for rather simple sampling designs.
DACSEIS (= Data quality in complex Surveys within the New European Information Society) was a project within the IST program of the European Commission which investigates variance estimation methods for complex surveys.One of its main tasks is the realization of simulation studies to compare different variance estimation techniques for several national surveys (the outline of the project is given in Münnich and Wiegert, 2001, the investigated surveys are described in Quatember, 2002).
A basic prerequisite for simulation studies are adequate universes from which samples according to a specific sampling plan can be drawn repeatedly.As data of the relevant national universes -i.e.census data -are in general not available for simulation studies, pseudo universes have to be constructed from survey data.These pseudo universes should allow sampling according to the sampling plan of interest, be close to the respective national universe regarding distributions of interesting variables and not violate disclosure control rules, see Münnich and Schürle (2003).To meet these requirements the structure of the universe according to the sampling plan has to be rebuilt, sizes of strata and clusters should be correct and homogeneity within respectively heterogeneity between strata and clusters should be replicated in the pseudo universes.To avoid possible identification of individuals the generation process has to be at least partly stochastic.
Once generated, a pseudo universe can easily be modified to study different aspects of the sampling scheme, for instance the effect of a different sampling frame or of a particular non-response mechanism.Samples from a pseudo universe can provide all estimates of interest and their simulation distribution gives detailed insight in their performance.
As the DACSEIS project started in 2001, for Austria the sampling plan of the AMC, which was used until 2003, was investigated.It turned out to be the most complex sampling plan studied within the project and thus can serve as an exemplar for a complex sampling design.Since 2004 the AMC is carried out according to a different, simpler sampling plan.
In this paper we describe the generation of a pseudo universe, appropriate for the former AMC sampling plan.Due to the complexity of the sampling plan a restriction to its basic properties was necessary.These are described in Section 2. Section 3 deals with the generation of the pseudo universe and in Section 4 the implementation of the sampling procedure is described.The simulation study presented in Section 5 illustrates the application of the method for several basic variance estimators and modifications of the pseudo universe, i.e. a different sampling frame and a certain non-response-mechanism.The summary given in Section 6, concludes the paper.

The Sampling Design
The Austrian Microcensus is a quarterly survey of Austrian households and is conducted by interviewers since 1967.It is intended to provide information on the structure of the Austrian population, families, households and dwellings.The questionnaire contains a mandatory core program and a voluntary supplementary program.Until 2003 1%, at present 0.68% of the Austrian households are selected.
The sampling frame for AMC used 1994-2003, was the Austrian Housing Census (HWZ = Häuser-und Wohnungszählung), performed with a period of 10 years.Sampling units were dwellings.These were selected for all AMC surveys to be conducted in the following 10 years.Quarterly one eighth of the sampling units was replaced, thus limiting the participation of sampling units to a maximum of 2 years in row.For each sample dwelling, characteristics of all households and persons living therein were recorded.
The sample design of the former AMC consisted of two parts.The first, in the following called Part A, comprised mainly dwellings in larger urban municipalities, the other, called Part B, dwellings in small, rural communities.Sampling was carried out separately for each of the nine federal states in these two parts, except for two federal states (Wien and Vorarlberg), which consisted only of Part A dwellings.
In Part A dwellings were selected as a stratified random sample where strata were built according to several dwelling characteristics, such as kind of dwelling, period of construction, floor space etc.As a combination of all strata variables would result in very small strata, these were pooled to give sample sizes of at least 10 dwellings per stratum, resulting in 100 -150 strata per federal state.The sampling fraction was different for each of the nine federal states.
In Part B a two-stage sampling procedure with stratified random sampling of primary sampling units (PSUs) was carried out.PSUs are communities or -in case of very small communities -groups of communities.For PSUs their number of dwellings and the agrarian quota were used as stratification variables, with different values defining the strata in each federal state.Number of strata per federal state ranged from 5 to 16.
Within each stratum the sample size was allocated proportional to size (i.e.number of dwellings) of districts.PSUs were selected randomly within each stratum.On the second stage a specified number of dwellings were drawn from the selected PSUs as secondary sampling units (SSUs).Depending on the federal state the number of SSUs was 20 or 25.
Selection of dwellings was carried out systematically according to a list sequential selection with a fixed starting value within each federal state of Part A as well as selected PSUs of part B.
Figure 1 illustrates the hierarchical structure of the universe according to the former AMC sampling plan.A more detailed description of this sampling plan can be found in Haslinger (1996).The pseudo universe for simulation studies according to the former AMC sampling plan, which we refer to as AMC pseudo universe, was generated following the general process for generation of pseudo universes developed within the DACSEIS project.This process was applied to build pseudo universes for different labour force and Microcensus surveys and is described in detail in Münnich and Schürle (2003).Principles of this generation process are exemplified for the AMC pseudo universe in Section 3.1, more technical details of its generation are given in Section 3.2.

Basic Principles for the Generation of the AMC Pseudo Universe
To generate a pseudo universe for simulation studies various aspects have to be regarded.For the AMC pseudo universe e.g., these are: • The pseudo universe should have the same structure as the real Austrian universe in all relevant aspects of the sampling plan, i.e. reflect the hierarchical structure defined by federal states, strata, PSUs, dwellings, households and persons.• The generated pseudo universe should be close to reality regarding the distribution of interesting variables.Especially all features which have an effect on the variance of estimators have to be regarded.Thus the generation of the AMC pseudo universe should reflect homogeneity or heterogeneity within respectively between strata in Part A as well as PSUs in Part B. • As the intended simulation studies are CPU-time as well as storage consuming the pseudo universe should be as small as possible regarding the number of variables.
A restriction to only a few variables impedes the identification of individuals and is thus advantageous also with a view to disclosure control.So apart from structure variables of the sampling plan only five personal characteristics were generated for pseudo individuals.
One of the main principles of the generation process in the DACSEIS project is to rebuild the structure of the universe concerning strata and clusters.This part of the generation process is deterministic as information on numbers and sizes of strata and clusters is available from the sampling plan.The pseudo universe thus has the same structure as the real universe from the viewpoint of the sampling plan.
The generation of sampling units -these are dwellings or households in most surveys -is carried out stochastically.A main problem in this context is to find a compromise between neglecting and maintaining correlation structures within sampling units (see also Münnich and Schürle, 2003).Creating individuals independently could lead to unrealistic results, e.g. a household consisting of children only, whereas taking into account all correlations would amount to sampling from high dimensional densities.Therefore to simplify sampling, age and gender structure are drawn from the data, i.e. from real households or dwellings, and values of the remaining variables are generated independently for each individuum conditional on age and gender.To obtain a close to reality situation empirical distributions from the data are used as generation distributions.Different generation distributions are used in strata and clusters.All dwellings in one stratum respectively cluster share the same generation distribution -they are referred to as generation groups in the following.This proceeding accounts for • homogeneity within groups by using the empirical distribution from this group • heterogeneity between groups by using different distributions.
Generation of the AMC pseudo universe was deterministic concerning the structure down to the hierarchical level of sampling units as illustrated in Figure 1 and stochastic for sampling units, i.e. dwellings in the AMC.The hierarchical structure within dwellings is illustrated in Figure 2.For each dwelling, values for the following variables were created: (Personal characteristics and possible outcomes are displayed in Table 2) To describe the stochastic part of the generation process more formally let P y denote the empirical distribution of y and P x y the conditional empirical distribution of y given x within a generation group.The hierarchical structure within a dwelling is generated to the following model The personal characteristics are generated for each household separately.Here P p (x 11 ,...,x 1p ,x 21 ,...,x 2p ) denotes the joint empirical distribution of x 1 and x 2 for all persons, determined from all households consisting of p persons in the generation group.For a 3 person household, e.g.values for age and gender are generated as random numbers from the appropriate 6-dimensional empirical distribution of all households of 3 persons.
The remaining personal characteristics x 3 , x 4 , x 5 , i.e. nationality, employment and educational level are generated independently for each person from their joint distribution in the generation group, given the values of x 1 and x 2 .As generation groups are rather small, in many cases only one person with a given age and gender would exist in the AMC data.Thus generation of the additional personal characteristics educational level, nationality and employment according to their conditional distribution given age and gender would result in a "cloning" of this individual and -if all individuals of one household are the only person with a specific age of their gender in the generation group -to a replication of entire households.To reduce the extent of replications, a modified variable x * 1 , i.e. age measured in 5 year-categories, was used for the construction of conditional distributions.Thus for each person values of x 3 , x 4 , x 5 are actually generated as random numbers

The AMC Pseudo Universe
Generation from AMC Data: For the generation of the AMC pseudo universe AMC data of quarter 1 in 2001 were used.These comprise a total of 233 variables containing information on dwelling, household and personal characteristics.Except cases where missing values occur on the household or dwelling level, every record contains data of one individual.Relevant personal characteristics used for the generation process are displayed in Table 2.
In the generation process sizes of strata and PSUs, i.e. the number of dwellings W in a stratum of Part A or PSU of Part B and the number of PSUs in a stratum of part B were considered deterministic.Whereas the latter remains constant and therefore is known from the sampling plan, the number of dwellings is subject to change in the course of time and had to be estimated.
Information on the number of dwellings in each Part A stratum and each Part B PSU in Austria was available from the HWZ 91, that is the Austrian Housing Census of 1991.Changes in the stock of dwellings are reflected in the AMC, as aborts are reported and new dwellings are selected additionally to the initial sample.Households and persons had to be generated only for housings serving as permanent residence, no households and individuals were generated for all other dwellings.
The actual number for both types of dwellings in each generation group represented in the AMC, was estimated by multiplying the number W of dwellings of each type in the HWZ with the change ratio w act /w 0 , that is their number in the actual AMC data w act divided by the number of the first AMC sample w 0 .
For each virtual dwelling serving as permanent residence the number of households, the number of persons per household, and values for the personal characteristics of individuals were generated as random numbers according to the model described before, for other dwellings the number of households and persons per household were set to zero.
Generation distributions according to the general model were built separately from this data set for each of the 1557 generation groups represented therein, i.Generation of dwellings in Part B therefore needed some modification as not every PSU is represented in the AMC.To generate a specific PSU of the pseudo universe therefore a sample PSU of the AMC in the same stratum was chosen at random and used as a model for the generation process.The number of actual permanent residence housings and other dwellings was estimated using their respective change ratios w act /w 0 in the model PSU.For the creation of permanent residence dwellings the generation distributions of the model PSU were used.So in every stratum of Part B several PSUs in the pseudo universe share the same generation distributions.Due to the random nature of the generation process and their different sizes these PSUs are not identical.Given the model PSU the proceeding for generation of dwellings was the same as for Part A dwellings.
In a last step all dwellings of a generation group, that means permanent residence and other dwellings, were pooled and arranged such that the positions of other dwellings where chosen at random and permanent residence dwellings were arranged according to their generation order.Comparing Pseudo Universe and AMC: Generation of the AMC pseudo universe was performed for small groups which are aggregated to form the total pseudo universe.It is therefore of interest whether the structure in the pseudo universe corresponds to that in the AMC data.Figure 3 shows marginal distributions of the generated personal characteristics age, gender and educational level, for age also relative differences are given.The frequencies realized in the pseudo universe are compared to so called "expected frequencies" which are computed from the empirical distributions within strata in the AMC data given the number of persons generated per stratum.Only for Part A these frequencies are expected from the generation distributions conditioned on the number of individuals per stratum.Distributions in Part B strata in fact are a mixture of the different generation distributions -that is the empirical distributions within model PSUs -where mixing proportions are the proportions of individuals in the respective model PSUs of one stratum.Differences between realized and "expected" frequencies are of small order.
To assess whether correlation structures in the pseudo universe are similar to those in the AMC data contingency coefficients were computed and are shown in Table 3. Absolute differences are small, relative differences are large only for small contingency coefficients.Therefore it can be concluded that the global structure of the AMC data is reproduced well in the pseudo universe.
A more detailed insight into the structure of the pseudo universe is given in Figure 4 showing the marginal distributions for the variables age, gender, and educational level  for federal states Tirol and Wien.Differences between the states are obvious for all three variables: The population in Wien is older with a higher proportion of females and people with higher educational level, especially completed secondary school or university.Differences between realized and expected frequencies from the AMC data again are of small order.Further results are presented for federal states and for Part A and B in Münnich and Schürle (2003).They show that the generation procedure is fairly successful in rebuilding the global structure as well as heterogeneity between federal states and parts of the AMC data in the generated pseudo universe.

The Sampling Procedure
The sampling procedure for the simulation study imitates that of the AMC but is not exactly identical to it, compare Quatember (2002).For the AMC the proportional stratified sampling of Part A dwellings is realized by a systematic selection.Dwellings are ordered sequentially according to a specific ordering.The systematic selection is carried out with a deterministic starting value and a selection interval to obtain the desired sampling fraction.
This procedure cannot be replicated for the simulation studies as, given the ordering, it is purely deterministic.Moreover, not all variables used to determine the ordering in the AMC are generated in the pseudo universe.Therefore in the simulation studies a systematic sampling of dwellings in each stratum with a random starting value per stratum is carried out.Dwellings are ordered according to their dwelling number, which corresponds to the order of their generation for permanent residence dwellings.
In Part B of the AMC, PSUs are selected according to a proportional stratified sampling.The PSUs of one stratum are selected randomly with manual control to guarantee a uniform regional distribution of selected PSUs.In the second stage dwellings are selected systematically with a fixed starting value and a specifical ordering of the dwellings (according to dwelling criteria) within the PSU.
For the simulation studies the adequate number of PSUs is selected randomly.Within a selected PSU dwellings are ordered according to their dwelling number and selected systematically with a random starting value.
Furthermore different from the AMC sampling procedure only selection of dwellings for one interview wave, that is without rotations, is realized in the simulation studies.

A Simulation Study
Carrying out simulation studies to gain insight in the properties of estimators is straightforward once the pseudo universe is generated and the sampling procedure is implemented.We demonstrate this in an exemplary simulation study comparing four direct variance estimators.The whole simulation process, i.e. drawing an AMC sample from the pseudo universe and calculating estimates was repeated 10000 times.The simulation studies were carried out on a Pentium P3 using C++ programs written by the second author.
Useful criteria for comparing variance estimators are bias, mean square error and -as one of the main purposes of variance estimation is to get at least approximate confidence intervals for parameters of the universe -the coverage of confidence intervals.

Variance Estimation of Totals
Estimation of Totals: An interesting total τ = k∈U y k of a universe U is usually estimated by the Horvitz-Thompson estimator where π k is the inclusion probability of unit k into the sample S. Published total estimates for the AMC differ from the Horvitz-Thompson estimator as the weights used differ slightly from the inverse inclusion probabilities and additionally non-response is accounted for.
In the following let A and B denote part A and B, b the federal state, h the stratum and i the PSU, C and c the number of PSUs in the universe, respectively the sample, and W and w the number of dwellings in the universe, respectively the sample.Totals of personal characteristics of the Austrian population are estimated from the AMC data by combining total estimates τAbh for strata in part A and τBbhi for PSUs in Part B as The inflation factor g Bbh is defined as Here, w = w (1) + w (2) + w (3) is the number of dwellings in the sample, where w (1) is the number of interviewed, w (2) the number of non-responding and w (3) the number of noninhabited dwellings.The inverse of the inflation factor for T Abh is the sampling fraction of dwellings multiplied with the proportion of responding dwellings.Estimation of totals in PSUs of Part B is analogous with indices Bbhi instead of Abh.
In the simulation study three different totals (τ e = number of persons with employment status = employed at least one hour, τ u = number of persons with educational level = university, N = population size) were estimated using this estimator.Results summarized in Table 4 indicate that estimates of all three totals are rather close to their respective true values in the pseudo universe.Different Variance Estimators: Variance estimation is a difficult task as the variance of an estimator depends on the variation of the characteristic of interest in the universe as well as on the sampling design.Appropriate variance estimators taking into account the specifics of a sampling plan have to be derived for each sampling plan individually.Thus in practical applications often simple variance estimators are used.
In our simulation we compare the performance of four different direct variance estimators.V1 , V2 , and V3 are simple variance estimators mostly neglecting the complex sampling design of the AMC, whereas the complex variance estimator V4 takes into account the effects of stratification and clustering.For the following definitions of these variance estimators let N and N b denote the number of individuals in the universe respectively federal state b and n and n b their number in the sample.
• V1 is the appropriate variance estimator under simple random sampling without replacement, i.e.
• Taking into account the different sampling fractions per federal states, but still assuming simple random sampling leads to the variance estimator • Assuming τb /N b = τ /N gives the variance estimator This variance estimator usually was published with the results of the AMC, see Haslinger (1996).
• The variance of the Horvitz-Thompson estimator takes into account also stratification and clustering and is given by where W xbh and w xbh , x ∈ {A, B}, are the number of dwellings in stratum xbh, in the universe respectively the sample, C 2bh and c 2bh are the number of PSUs in stratum Bbh in the universe and the sample, s 2 Abh and s 2 Bbhi are sample variance in stratum Abh and PSU Bbhi and s 2 Bbh is the variance between PSUs in stratum h.In contrast to the simple estimators V1 , V2 , and V3 , knowledge on the population size N is not required for V4 .It is therefore a useful variance estimator for estimators of a population size.
Table 5 gives the results of the simulation study for estimated standard errors ŝi = Vi and Figure 5 shows boxplots for the distributions of all 4 variance estimators for the totals τ e and τ u .The reference value for the performance of standard error estimators is the standard error in the simulation study.Obviously ŝ1 , ŝ2 , and ŝ3 underestimate the true standard error in the simulation.As the total estimator τ is more precise than the Horvitz-Thompson estimator, ŝ4 is slightly biased upwards.
A (1 − α)-confidence interval for a total τ based on the normal approximation is obtained from an asymptotically unbiased estimate τ and a variance estimate V (τ ) as The most serious consequence of underestimation of standard errors is that confidence intervals do not reach the nominal coverage.Actual coverages can be far too low for Austrian Journal of Statistics, Vol. 35 (2006), No. 4, 419-435   simple variance estimators as can be seen from Table 5, only confidence intervals based on ŝ4 reach the nominal confidence level.Consequences are even worse for smaller areas, e.g.federal states.Figure 6 compares the non-coverage, i.e. the proportions of 95%-confidence intervals not covering the true value τ b e of federal state b for ŝ1 and ŝ4 .Note that for federal states ŝ2 and ŝ3 coincide with ŝ1 .Non-coverage is about 5% for confidence intervals based on ŝ4 , but the simple estimator ŝ1 leads to actual non-coverage ranging from 10% to nearly 30%.
The results indicate a design effect greater than 1 and show that simple variance estimators -though widely used in practice -can severely underestimate the sampling error for the complex sampling design.Confidence intervals based on these estimators have an actual coverage far below the nominal level.Complex variance estimation, taking into account stratification and clustering of the sampling plan results in less biased estimates and confidence intervals attaining the nominal coverage.
That simple variance estimators may be downward biased is well known from theoretical results -this simulation study allows to specify the extent of this bias in a close to reality situation.

Effects of Modification of Pseudo Universes
Effects of the sampling frame or non-sampling errors, such as non-response, on variance estimation can be investigated rather easily by modifying the pseudo-universe.To illustrate this point, two modifications of the first pseudo-universe, in the following called PU1, were realized: 1. Pseudo-universe PU2 comprises only inhabited dwellings and thus implies a different sampling frame.It was obtained from PU1 by removing all uninhabited dwellings.2. Pseudo universe PU3 allows to study the effects of a certain non-response mechanism.It was created by implementing a unit non-response mechanism.For every dwelling a 0-1 random variable -1 indicating response, 0 non-response of all individuals in this dwelling -was generated according to the non-response rate of the respective generation group in the AMC data.This was the only available information about non-response in the AMC.Implementation of a more realistic non-response mechanism, e.g.non-response probabilities depending on number of households or inhabitants of a dwelling would require further information which is not available from the AMC data.
Simulation results for the estimation of the total τ e in the 3 different universes are given in Table 6, results for the standard error estimators of τe are presented in Table 7. Obviously non-response is not quite adequately accounted for as τe is more biased upwards in PU3 than in PU1 and PU2.
Standard errors are lower for PU2 as sampling from a universe without uninhabited dwellings for the AMC sampling plan implies a higher number of sampled individuals.The percentage of uninhabited dwellings is 14.5% in PU1 leading to a reduction in the standard error of about 8.9% for PU2 compared to PU1.Implementation of non-response amounts to a reduction of the number of sampled respondents, thus leading to an increase of the sample standard error.The overall nonresponse rate is 12.2% of inhabited dwellings which leads to an increase of the standard error of 4.9% in PU3 compared to PU1.
Results for standard error estimations are similar to those presented above and again show the better performance of the complex estimator ŝ4 .In each of the pseudo universes only ŝ4 does not underestimate the true standard error and thus leads to confidence intervals attaining the nominal coverage.

Summary
For surveys with complex sampling designs variance estimators cannot be compared on theoretical grounds.The aim of this paper was to show that a comparison via simulation is feasible also for large universes.
Usually real universes are not available, therefore as a first step synthetic universes have to be generated.In the DACSEIS project a generation process for pseudo universes, consisting of a deterministic part and a stochastic part was developed.This process is described for the special case of a pseudo universe for the sampling plan of Austrian Microcensus, 1994Microcensus, -2003.The structure of the universe concerning stratification and clustering is rebuilt according to the sampling plan, assuming number and sizes of strata and clusters as deterministic.Sampling units of the AMC are dwellings.In the pseudo universe synthetic dwellings, including households and individuals living therein were generated stochastically.
To illustrate the application of the simulation setup, a simulation study is presented where 10000 samples according to the AMC sampling plan were drawn and estimates and direct variance estimates were calculated for each sample.True values of interesting quantities in the pseudo universe are known and the simulation distribution of e.g. a total estimator provides a reference value for the performance of different variance estimators.Results show -not unexpectedly but nevertheless often ignored in applied work -that simple variance estimators underestimate the true sampling error and give a drastic impression of the extent of this bias.
The simulation setup described above is currently used for assessing properties of different methods for variance estimation under non-response, for first results see Quatember (2005).

Figure 1 :
Figure 1: Structure of the universe

Figure 2 :
Figure 2: Hierarchical structure within dwellings e., for each of 1189 Part A strata and each of 368 sample PSUs of Part B. As a consequence of the different sampling plans for Part A and B, every generation group of Part A (i.e.every stratum) but not of Part B (i.e.every PSU) is represented in the AMC data set.That means that AMC data are available for each generation group of Part A, but only for sample generation groups of Part B.

Figure 3 :
Figure 3: Marginal frequency distributions within the AMC pseudo universe and the data

Figure 4 :
Figure 4: Marginal frequency distributions within federal states TIR and WIE of the AMC pseudo universe

,
its inverse is the proportion of dwellings in sample PSUs of all dwellings in a stratum of Part B. It differs from the weight of the Horvitz-Thompson estimator as the inclusion probability of PSUs is c/C.Totals of strata in Part A estimated as weighted sample totals T , i.e.

Figure 5 :
Figure 5: Boxplots for standard error estimates of τ e (left) and τ u (right)

Figure 6 :
Figure 6: Percentage of confidence intervals not including the true τ eb for federal states

Table 1 :
Table1gives the number of strata per part and federal state.The total number of PSUs within Part B is 1710, leading to a total of 2899 generation groups.The sizes of the generation groups, i.e. the number of dwellings, are regarded as deterministic.Partition of federal states of the AMC pseudo universe into parts and strata

Table 2 :
Personal characteristics included in the AMC pseudo universe Stochastic Part of the Generation Process: Conditional on the size, dwellings, households, persons, and personal characteristics are generated stochastically.The stochastic part of the generation process is identical within a generation group and different between generation groups, as empirical distributions from AMC data (of a Part A stratum or a Part B PSU) serve as generation distributions.
persons in this household values for variables x 1 and x 2 , i.e. age and gender are generated in one step as (x 11 , . . ., x 1p , x 21 , . . ., x 2p ) ∼ P p (x 11 ,...,x 1p ,x 21 ,...,x 2p ) Let p denote the number of persons -we drop the index k from now on -in a given household, then first, for all

Table 3 :
Contingency coefficients within the AMC data and the Austrian pseudo universe

Table 4 :
Simulation results for the estimation of totals

Table 5 :
Simulation results for standard error estimators of totals

Table 6 :
Simulation results for the estimation of τ e in modified pseudo universes

Table 7 :
Simulation results for standard error estimators in modified pseudo universes