Eva-Maria Asamer and Franz Astleithner and Predrag Cetkovic and Stefan Humer and Manuela Lenk and Mathias Moser and Henrik Rechta Quality assessment for register-based statistics-Results for the Austrian census 2011

In 2011, Statistics Austria carried out its first register-based census. Advantages of using administrative data for statistical purposes are, among others, a reduced burden for respondents and lower cost for the National Statistical Institutes (NSI). However, new challenges, like need for a new approach to the quality assessment of this kind of data arise. Therefore, Statistics Austria developed a comprehensive standardized framework to evaluate data quality for register-based statistics. In this paper, we present the basic concept of this quality framework and provide detailed results from the quality evaluation of the Austrian census of 2011. More specifically, we derive a quality measure for each census attribute from four complementary hyperdimensions. The first three of these hyperdimensions address the documentation of data, the usability of records and an external data validation. The fourth hyperdimension focuses on the quality of data imputations. The proposed framework combines these different quality-related information sources for each attribute to form an overall quality indicator. This procedure allows to track changes in quality during data processing and to compare the quality of different census generations.


Introduction
The importance of administrative data as an input for statistical purposes has increased steadily in the last decades.Following the example of Scandinavian countries, about one third of the United Nations Economic Commission for Europe (UNECE) members now base their census at least partially on administrative data (UNECE 2014).In Austria, the last surveybased census was conducted in 2001 and was replaced with a register-based census in 2011.The advantages of the new approach are manifold and comprise inter alia a reduced burden for respondents and lower costs.However, quality assessment for administrative data has only received little attention in the statistical literature.Hereafter, we present a standardized quality framework for the assessment of administrative data, which was developed by Statistics Austria in the course of the Austrian register-based census of 2011.The procedure tries to aggregate all available (meta-)information to generate a single quality indicator for each attribute for each statistical unit.This paper is structured as follows.Section 2 introduces the data sources for the register-based census.In section 3, the quality framework is introduced using the example of the quality assessment for the variable Legal Marital Status (LMS).Section 4 provides summary results for the overall quality assessment of the Austrian census and finally section 5 concludes.

Sources for the register-based census
A decisive quality-related topic for register-based statistics is the selection of appropriate data sources that supply certain required information.To give an overview for the data sources used in the Austrian census, Figure 1 illustrates the connections between the actual data sources (administrative registers), the topics of the census (data cubes) and the final census data, so-called statistical registers, which are derived from the data cubes.The administrative registers provide information for the register based census (first row).This administrative information is gathered in data cubes (second row).The actual census data are "statistical registers", i.e. they are derived from the data cubes (third row).
For the purpose of the census, Statistics Austria distinguishes between 7 base registers and 8 comparison registers.While the base registers are needed to provide all attributes of interest for the register-based census, the subset of grey shaded registers form the backbones of the census.Especially, they determine the population count, the number of buildings and dwellings and the quantity of enterprises as well as their local units.The base registers are maintained by Statistics Austria.To further improve the quality of the census, the base registers are supported by eight comparison registers, which gather additional information for cross-checks from more than 50 external data holders.1For cases where more than one autonomous data source is available for a specific attribute, the different registers are used to mutually validate information.Accordingly, we apply a principle of redundancy to improve quality of data (Lenk 2008, p. 3).

The quality assessment of administrative data
National Statistical Institutes (NSI) often rely on external data sources for which they are not directly responsible.Such third party data sources may constitute a large part of the required information for e.g.population censuses, as it is the case for Austria.Hence, the relevance of a comprehensive quality assessment in the process of using register-based statistics has to be emphasized.The following approach for the evaluation of administrative data extends previous work by other NSIs in this field (Daas, Ossen, Vis-Visschers, and Arends-Tóth 2009;Daas and Fonville 2007) and especially relies on four quality-related hyperdimensions (Berka et al. 2010;Berka, Humer, Lenk, Moser, Rechta, and Schwerer 2012).
Besides these quality dimensions, the actual data processing for the Austrian census is conducted in three stages that have to be considered in the quality assessment: the raw data (i.e. the administrative registers i), the combined dataset of these raw data (Central Data Base, CDB) and the final dataset, which includes imputations (Final Data Pool, FDP).The two latter are referred to as "statistical registers".Figure 2 illustrates the data processing, beginning with the delivery of raw data from the various administrative data holders.The four hyperdimensions, named HD D , HD P , HD E and HD I , aim to assess the quality for different types of attributes at all stages of the data processing.This yields quality measures which are standardized between zero and one, where a higher value indicates better data quality.
In this process, HD D describes quality-relevant processes at the register authority, HD P deals with formal errors in the data and HD E measures the data quality in comparison to an external source.Finally, HD I computes the quality of the imputations.q n i,j denotes the combined quality measure of HD D , HD P and HD E for an attribute j in a register i at the observation level n (e.g.q 1,A describes the quality of attribute A in register 1).
The data from Austrian administrative registers can be linked through unique personal identifiers (so-called branch-specific personal identification number, bPIN) and are collected in data cubes, namely the CDB.We distinguish three types of attributes in the CDB, which have to be treated differently from a quality perspective: Unique attributes (see attribute C in Figure 2) exist only in a single administrative register.Multiple attributes are available from more than one administrative register (see attribute A).Derived attributes are not covered by any register in the required form.Therefore, such attributes have to be derived from related information (see attribute F and G).
We denote the combination of the raw data quality measures on the CDB level as q n ,j .For multiple and derived attributes, the CDB is then again validated using the hyperdimension External Source (HD E ), to capture quality-related effects of the register aggregation process.The final quality indicator at this stage is then given by a combination of q n ,j and the HD E quality, which we refer to as q n Ψ,j , the effective CDB quality indicator.Finally, the quality of imputed values has to be taken into account.Before the imputation process, the quality of missing values on raw and CDB level is zero by definition.The hyperdimension HD I assesses the quality of the imputations and replaces the zero defaultquality for missings with a new value, which depends on the imputation method.Again, every statistical unit n obtains a quality indicator for every attribute j, which we denote q n Ω,j , the final FDP quality measure.2 Based on this framework, we will in the following illustrate the calculation of the quality indicators in detail using the example of the attribute Legal Marital Status (LMS).This process includes the assessment of the quality on the raw data level for each observation in each register as well as the evaluation of the statistical registers CDB and FDP.

The raw data level
The quality assessment starts by collecting quality information at the raw data level (left boxes in Figure 2), which is obtained via the three hyperdimensions Documentation (HD D ), Pre-processing (HD P ) and External Source (HD E ).A detailed explanation for each hyperdimension is given below.

Hyperdimension Documentation (HD D )
HD D describes quality-related processes as well as the data documentation (metadata) at  On the raw level, every attribute (A-E) in the administrative registers REG1 and REG2 is evaluated through hyperdimensions HD D , HD P , HD E .Combining all quality-related information yields the CDB quality measure qΨ,i, measured on the level of statistical units.In the final step, the quality of imputations is assessed for the FDP using HD I .At the end of the process every statistical unit has obtained a quality value between zero and one.
the administrative authorities.Data holders are assigned a degree of confidence and reliability which are monitored using a questionnaire on quality-relevant procedures, that contains several open and scored questions.Administrative authorities are requested to answer the questionnaire for every attribute seperately, to control for different treatment of single variables.The scored questions are used for the quality-assessment and can be divided into four subgroups, covering data historiography, definitions, administrative purpose and data treatment.Through these scored questions, every attribute j in each administrative register i obtains a quality measure HD D i,j at this stage, as described by equation 1.
The set of open questions serves as complementary information but is not considered in the quality assessment.Table 1 gives an overview over the current set of questions.
For the exemplary case of the LMS, data is obtained from eleven source registers which have to be assessed individually.The calculation of the hyperdimension documentation (HD D ) for each source register is illustrated in Table 2.
The data holders answer quality related questions on a dichotomous (yes/no) or an ordinal scale.For each question, a higher value indicates better quality-related performance of the register.Furthermore, each question is weighted by its relative importance, which was determined by experts in the field of register-based statistics at Statistics Austria.The metadata quality for each register is summarized as the weighted average of these scored questions.For example, a value of 1 for the question "Definitions" in the central population register (CPR) indicates, that the definition of the Legal Marital Status is consistent between the CPR and the register-based census.
In practice, data for a single comparison register may be delivered from up to 20 different data authorities (e.g.regional offices).For such cases, the hyperdimension Documentation is applied separately for each delivery and then processed, so that these sources are aggregated to one comprehensive comparison register.To assess the HD D quality for such complex cases, the relative contribution to the comparison register (in terms of observations) is used to compute a weighted average for the questionnaire of each (regional) delivery.One example of such a register is the Social Welfare Register (SWR).Consider for example the "Cut-off date" in Table 2.The indicator yields an average value of 0.47 for all data deliveries.Equivalently, for only 47 per cent of the records in the SWR data copies for a specific reference date are available.
The weighted average of the scores on the different questions forms the quality indicator on the register level (HD D LM S ).Results are shown in the last row in Table 2.A lower value, as presented for the Asylum Seekers Register (ASR), indicates bad performance of the attribute in this administrative register with respect to the Hyperdimension HD D .On the contrary, a high value, as found in the Tax Regsiter (TR), indicates a good performance.The final result of this hyperdimension is given by the ratio of usable records to the total number of records (see equation 2).
usable records total number of records (2) HD P results for the LMS in the source registers are shown in Table 4.For this case, most data sources provided formally correct information, resulting in indicators close to 1.An exception are data from the Asylum Seekers Register (ASR) and the Social Welfare Register (SWR) where a significant amount of missing unique personal identifiers (56.1 per cent and 14.4 per cent respectively) can be found.Accordingly, this procedure yields lower quality indicators for these registers.The hyperdimension (HD E LMS ) is again conducted on the raw data level and assesses the data-quality of the source registers in comparison to an external source, which in our case is given by the Austrian microcensus.The quality indicator is simply calculated as the number of consistent values between each register and the microcensus, divided by the number of all records that could be linked to the microcensus (see equation 3).
number of consistent values total number of linked records (3) In Table 5, we present results for the comparison of LMS between the raw registers and the microcensus.For example, 1, 239 individuals from the Unemployment Register (UR) could be linked to the microcensus.Out of these observations, 1.9 per cent were classified as inconsistent.This yields a HD E UR,LMS value of 0.981 for the LMS in the UR.

Final quality on the raw-data level
Given these three quality measures, an overall quality indicator for each attribute on the register level can be derived as the weighted average described in equation 4. While these indicators do not vary per observation (but rather register and attribute), these are still attached to each observation, so that the quality of a single record can be traced throughout the process.In our framework, each hyperdimension has the same weight (v D = v P = v E = 1/3), reflecting our assumption on an equal impact of each dimension on the quality measure.
The resulting value summarizes the existing quality-related information for each attribute j in each register i.
Table 6 summarizes the combined information for the attribute LMS for each register.Accordingly, we obtain eleven quality indicators for the LMS.The Asylum Seekers Register (ASR) has the lowest quality-measure, while the Child Allowance Register (CAR) delivers the best quality for the LMS.The differences in quality result partly from the different population subgroups covered by the individual registers (families with young children vs. foreign persons).An additional reason for the variation is the varying importance of LMS for different register authorities.LMS is highly relevant for the CAR but less important in the ASR.This fact influences quality outcomes as well.
These quality indicators on register level serve as the main input to the next step of the framework, the evaluation of the LMS in the CDB.

The Central Data Base (CDB)
In a next step, the data from the raw registers are combined in the Central Data Base (CDB), which covers all attributes of interest for the register-based census.At this level, a quality indicator q ,j for each attribute j and statistical unit n is computed.Concerning the evaluation of quality for the CDB we need to distinguish three types of attributes, unique, multiple and derived as described earlier.
LMS is a multiple attribute, and accordingly shows up in several registers.Since there are multiple data sources which provide LMS, a predefined ruleset, based on experience of Statistics Austria, picks the most appropriate value from the underlying registers according to the constellation in the source registers.To assess the validity of this chosen value, all the available information for LMS from all registers is taken into account.To aggregate this information, the Dempster-Shafer Theory (DST, see Shafer 1992) for the combination of evidence is applied to derive a quality measure for this attribute for each statistical unit.Berka et al. (2012) give a detailed explanation of the possibilities to apply DST to the assessment of quality.
To combine the different quality measures for LMS on the raw data level, they are interpreted as beliefs in the degree of correctness of each data source.In such a setting, DST combines the existing evidence and takes all available information from the registers into account to form one quality-indicator on the CDB level, denoted q ,j for each statistical unit n.
In a further step, the values in the CDB are compared to an external source (reapplying HD E ), to address possible quality issues in the process of combining the raw data to the CDB.The process of picking the values for the CDB has to be independent from the data generation.
Otherwise the results of the quality assessment would be skewed.For this reason, the belief values can't be the starting point for picking values from the source registers.The evaluation yields a final quality indicator in the CDB q Ψ,j .Table 7 shows the final quality measure on CDB level for LMS (q Ψ,LMS ), which is the weighted average of q ,LMS and HD E CDB,LMS on CDB level.In this special example q Ψ,LMS is 0.728.Hence, HD E CDB,LMS slightly increases the quality indicator compared to the sole combination of raw quality indicators.

The Final Data Pool (FDP)
In the last step of the data generation, values, which are missing in the CDB, are imputed.
For the assessment of the data quality of these imputations, the Hyperdimension HD I is applied.The distinction of imputation methods is a crucial factor for this task (Kausl 2012).
As an example, the Austrian census uses different methods, such as deterministic editing, hot-deck techniques and logistic regressions.However, the principle for the evaluation of the imputations is the same for all methods.It is based on both the quality of the inputs and the quality of the imputation model.The quality of the inputs is assessed as a weighted average of the quality of the input variables k, which have already been calculated in the CDB stage.
The accuracy of the imputation models m is consistently assessed by using classification rates (Φ), as shown in equation 5, where q Ω,k is the quality of a specific input variable k and Φ m j is the classification rate for a certain imputation method m applied to attribute j.
The classification rate resembles the number of correct imputed values if the model is applied to existing data. 4Using both the input variables for the imputation process and the classi-fication rate, the quality of the imputations is derived as the observation-weighted average of both indicators.For a detailed explanation of the quality evaluation of different imputation techniques in this framework, see Schnetzer, Astleithner, Ćetković, Humer, Lenk, Moser, Schwerer, and Rechta (2015).
Table 8 shows the change in the average quality level from the CDB to the FDP stage.The average quality in the CDB is 0.728 (q Ψ,LMS ).The missing records on CDB level, which were assumed to have a quality of zero, are imputed on the FDP level and obtain a quality measure as described above.The average of the imputation quality HD I for the LMS is 0.956. 5Since missing values now have a quality indicator larger than zero, the average quality of the LMS has increased in the FDP (q Ω,LMS = 0.949) vis-a-vis the CDB (q Ψ,LMS = 0.728).The quality framework, which we outlined above, is used for the quality assessment of the whole Austrian register-based census in 2011.To provide an overall picture for the quality assessment, we present the quality measures for selected attributes on CDB and FDP levels.
Table 9 shows, that while all attributes listed here have received a combined quality indicator on the CDB level (column 2), only multiple attributes and those derived on the CDB level have received an additional HD E check (column 3).For unique attributes, such as Educational Attainment, this is not necessary, since the CDB values are based directly on the ones available in the single source register.The final CDB indicators in column 4 then either represent the combined CDB value from column 2 (for unique attributes) or the weighted average between 2 and 3 (for multiple attributes and attributes which are derived on CDB-level).
The next column (5) gives an overview of the average quality of imputations.However, not all of the attributes had to be imputed.Only those attributes with imputations received a quality indicator HDI at this stage.Column 7 lists the number of imputations, which can be interpreted as the contribution of HD I to the final FDP quality indicator in column 8. Attributes derived from imputed data sets (only available in the FDP) are consistently compared to an external source at this stage (column 6).
In the following we exemplarily illustrate some of the main findings for different kinds of attributes.For example, the quality of the multiple attribute Age, has been compared to an external source HD E on CDB level, which in general confirmed the combined raw quality indicator (0.997 versus 0.999).Additionally records have been imputed on FDP level.The quality of these imputations is notably lower than the available data from registers (0.731 vs 0.998), but still lead to a marginal improvement, since the few missing values were considered with the default quality of 0 until now.
The highest data quality, according to our framework, can be found for Sex, which is available in a large number of high-quality registers.Additionally, there are no imputations necessary, so that the data quality on FDP level is the same as for the CDB.An example of a unique attribute is Educational Attainment.By definition, the quality for such attributes in the source registers is equal to that in the CDB.However, a substantial number of values (293,698) need to be imputed on the FDP level.While the quality of imputed values (0.595) is lower than that of the CDB (0.791), the imputations still lead to an increase in the final quality measure (0.815) because of the zero valuation of missings until this stage.
An extremely simple case is the unique attribute Field of Educational Attainment, which has required no imputations, resulting in an equal quality for the raw register level, the CDB and the FDP of 0.819.To further highlight a special example, the Family Status is a derived attribute, for which the derivations have been conducted on FDP level.This reflects the fact, that it is constructed from already imputed datasets, only available in the FDP.Therefore, the comparison to an external source is consistently carried out on the FDP-level, as opposed to e.g.Current Activity Status, for which this process can be carried out in the CDB.

Conclusion and outlook
This contribution applies a comprehensive quality-framework that allows for the assessment of quality of administrative data to the Austrian census of 2011.The main aim of the framework is to implement an objective procedure for the evaluation of different kinds of administrative data.
The framework is based on a modular design.This means that it can be used for a variety of purposes.We generate quality indicators for raw data, their combination as well as the combined and imputed data sets, so that the evaluation tracks the whole data generation process.The single modules are connected through user-defined weights (e.g.weighting of hyperdimensions), that allow for a flexible application in accordance with the special needs of a certain application.
As a result, we are able to compare data quality between different processing stages, data revisions, registers and even single attributes.Furthermore, all these indicators are calculated on the level of individual observations at every stage, so that quality-related information can also be tracked for data subsets.
A major asset of this approach is, that the calculated values can also be used to assess the uncertainty of calculations that make use of administrative data.This possibility is currently ongoing research and should be addressed in more detail in future discussions.
From the perspective of NSIs, the framework is especially valuable to monitor data quality over time and assess how it changes as a reaction to e.g. the introduction of new methods or processing tools.For the special application to a register-based census, which has been highlighted in this contribution, the framework allows the monitoring of different data revisions and subsequent censuses to provide detailed quality information to data users and the NSI itself.
The final application of the framework to the Austrian census of 2011 highlights how the quality of single attributes depends on the data authority, the number of available comparison registers and the processes at the NSI (e.g.imputation strategies).
Future research opportunities should be concerned with the applicability of the framework to other areas of interest and the enhancement of the framework to cover even more aspects where possible.

Figure 1 :
Figure 1: Data sources for the register-based census

Figure 2 :
Figure 2: Schematic overview of the quality framework for register-based censuses.

Table 1 :
Scored Questions -HD Documentation This refers to the availability of data for a particular reference date.

Table 2 :
Calculation of hyperdimension documentation HD D for Legal Marital Status (LMS) Asylum Seekers Register, UR: Unemployment Register, RPS : Register of Public Servants of the Federal State and the Länder, CAR: Child Allowance Register, CFR: Central Foreigner Register, CSSR: Central Social Security Register, CHR: Chambers Register, HPSR: Hospital for Public Servants Register, Quality Assessment for Register-based Statistics values, values out of range, see Table 3) for each attribute in each register.
SWR: Register of Social Welfare Recipients, CPR: Central Population Register, TR: Tax Register.Hyperdimension Pre-Processing HD PThe hyperdimension pre-processing (HD P ) is again based on the actual raw data received by the NSI and computes the share of useless records (missing identification keys, missing

Table 3 :
HD Pre-processing Number of observations [Observations] -Records without unique bPIN [Missing bPIN] -Records with item non-response (but including unique bPIN ) [Non resp.]-Records with wrong values or values out of range [ Out of range] = Usable records

Table 4 :
Calculation of the hyperdimension HD P for the Legal Marital Status (LMS)

Table 5 :
Calculation of the hyperdimension HD E for the Legal Marital Status (LMS)

Table 6 :
Calculation of the quality indicator for the (LMS) for the registers

Table 7 :
The quality for the LMS on CDB level

Table 8 :
Quality monitoring for LMS from CDB to FDP level

Table 9 :
Results for the Austrian census on FDP-level Number of Imputations.Imputation methods include deterministic editing, derived from imputed FDP attributes as well.