Macro-Integration for Solving Large Data Reconciliation Problems

Macro-integration technique is a well established method for reconciliation of large, high-dimensional tables, especially applied to macro-economic data at national statistical offices (NSO). This technique is mainly used when data obtained from different sources should be reconciled on a macro level. New areas of applications for this technique arise as new data sources become available to NSO’s. Often these new data sources cannot be combined on a micro level, while macro integration could provide a solution for such problems. Yet, more research should be carried out to investigate if in such situations macro integration could indeed be applied. In this paper we propose two applications of macro-integration techniques in other domains than the traditional macro-economic applications. In particular: reconciliation of tables of a virtual census and reconciliation of monthly series of short term statistics figures with the quarterly figures of structural business statistics.


Introduction
Macro-integration is widely used for reconciliation of macro figures, usually in the form of large multi-dimensional tabulations, obtained from different sources.Traditionally these techniques have been extensively applied in the area of macro-economics, especially in the compilation of the National Accounts, for example to adjust input-output tables to new margins (see, e.g.Stone, Champerowne, and Maede (1942)).Combining different data at macro level, while taking all possible relations between variables into account, is the main objective of reconciliation or macro-integration.Combining different data sources also makes it possible to detect and correct flaws in data and to improve the accuracy of estimates.The methods for macro-integration have developed over the years and have become very versatile techniques for solving integration of data from different sources at macro level.In this paper we propose new applications of macro-integration techniques in other domains than the traditional macroeconomic applications.
In this paper we investigate the application of macro-integration techniques in the following areas: reconciliation of tables for the Census 2011 and reconciliation of monthly short term statistics figures with the quarterly structural business statistics figures.
The paper is organized as follows: in Section 2 we will give a short outline of macro-integration methods used in this paper, including the extended Denton method (Denton 1971).The extended Denton method we use in this paper is defined in Bikker, Daalmans, and Mushkudiani (2013).In Section 3, we describe virtual Census 2011 data at Statistics Netherlands (SN) and the application of a macro-integration method for these data.In Section 4, we will do the same for the monthly series of the short term statistics figures.The conclusions can be found in Section 5.

The macro-integration approach
We consider a set of estimates in tabular form.These can be quantitative tables such as average income by region, age and gender or contingency tables arising from the cross-classification of categorical variables only, such as age, gender, occupation and employment.If some of these tables have certain margins in common and if these tables are estimated using different sources, these margins will often be inconsistent.If consistency is required, a macro-integration approach can be applied to ensure this consistency.
The macro-integration approach to such reconciliation problems is to view them as constrained optimization problems.The totals from the different sources that need to be reconciled because of inconsistencies are collected in a vector x (x i : i = 1, . . ., N ).Then a vector x, say, is calculated that is close to x, in some sense, and satisfies the constraints that ensure consistency between the totals.For linear constraints, the constraint equations can be formulated as where C is a c × N matrix, with c the number of constraints and b a c-vector.These linear constraints include equality constraints that set the corresponding margins of tables estimated from different sources equal to each other as well as benchmarking constraints that set the estimates of certain margins from all sources equal to some fixed numbers.The equality constraints are likely to apply to common margins that can be estimated from different sample surveys but cannot be obtained from a population register, while the benchmarking constraints are likely to apply when the common margins can be obtained from register data in which case the fixed numbers are the values for this margin obtained from the register.
Consider a class of penalty functions represented by (x − x) A(x − x), a quadratic form of differences between the original and the adjusted vectors, here A is a symmetric, N × N nonsingular matrix.The optimization problem can now be formulated as: In the case that A is the identity matrix, we will be minimizing the sum of squares of the differences between the original and new values: To solve this optimization problem, the Lagrange method can readily be applied.The Lagrangian is with λ a vector with Lagrange multipliers.For an optimum, we must have that the gradient of L(λ, x) with respect to x is zero.This gradient is: By multiplying both sides of this equation with C and using equation ( 1) we obtain for λ: where CA −1 C is a square matrix that is nonsingular as long as there are no redundant constraints.Substituting this result in (3) leads to the following expression for x:

Comparison with the GREG-estimator
In survey methodology it is common to make use of known marginal totals of variables that are also measured in the survey by the use of calibration or generalized regression (GREG) estimation, see, e.g.Särndal, Swenson, and Wretman (1992).Following Boonstra (2004), we will compare in this subsection the GREG-estimator with the adjusted estimator given by Equation ( 4) for the estimation of contingency tables with known margins.
The situation in which calibration or GREG-estimation procedures can be applied is as follows.
There is a target variable y, measured on a sample of n units, for which the population total, x y say, is to be estimated.Furthermore, there are measurements on a vector of q auxiliary variables on these same units for which the population totals are known.For the application of the GREG-estimator for the total of y, first the regression coefficients for the regression of y on the auxiliary variables are calculated.Let the measurements on y be collected in the n-vector y with elements y i , (i = 1, . . ., n), and the measurements on the auxiliary variables in vectors z i and let Z be the n × q matrix with the vectors z i as rows.The design-based estimator of the regression coefficient vector β can then be obtained as the weighted least squares estimator with Π a diagonal matrix with the sample inclusion probabilities π i along the diagonal.
Using these regression coefficients the regression estimator for the population total of y is estimated by xy.greg = xy.ht+ (x z.pop − xz.ht ) β, with xy.ht and xz.ht the 'direct' Horwitz-Thompson estimators, i y i /π i and z i /π i , for the population totals of y and z, respectively and x z.pop the known population totals of the auxiliary variables.The regression estimator xy.greg can be interpreted as a 'weighting' estimator of the form i w i y i with the weights w i given by From ( 7) two important properties of the GREG-estimator are directly apparent.Firstly, the weights depend only on the auxiliary variables and not on the target variable.This means that the GREG-estimators for different target variables can be obtained by the same weights as long as the auxiliary variables remain the same.Secondly, the GREG-estimates of the totals of the auxiliary variables, xz.greg = i w i z i , are equal to their known population totals.
with xy.ht the p−vector with Horvitz-Thompson estimators for the target variables and B the p × q-matrix with the regression coefficients for each target variable on the rows.Generalizing (5), we have for the coefficient matrix , where Y is the n×p-matrix with the vectors of target variables, y i , on the rows.Now, consider the case where the totals to be estimated are the cell-totals of a contingency table obtained by the cross-classification of a number of categorical variables.For instance, the target totals could be the numbers of individuals in the categories 1.Unemployed and 2.Employed of the variable Employment by age category and sex in some (sub)population.If we assume, for ease of exposition, that Age has only two categories, 1.Young and 2.Old and Sex has the categories 1.Male and 2.Female, then there are eight totals to be estimated, one for each cell of a 2 × 2 × 2 contingency table.Corresponding to each of these eight cells we can define, for each individual, a zero-one target variable indicating whether the individual belongs to this cell or not.For instance y 1 = 1 if Employment = 1, Age = 1 and Sex = 1, and zero in all other cases and y 2 = 1 if Employment = 2, Age = 1 and Sex = 1, and zero in all other cases, etc.Each individual scores a 1 in one and only one of the eight target variables.
For such tables, some of the marginal totals are often known for the population and GREGestimators that take this information into account are commonly applied.In the example above, the population totals of the combinations of Sex and Age could be known for the population and the auxiliary variables then correspond to each of the combinations of Sex and Age.The values for the individuals on these auxiliary variables are sums of values of the target variables.For instance, the auxiliary variable for Age = 1 and Sex = 1 is the sum of y 1 and y 2 and will have the value 1 for individuals that are young and male and either employed or unemployed and the value 0 for individuals that are not both young and male.Similarly, we obtain for each of the four Age × Sex combinations zero-one auxiliary variables as the sum of the corresponding target variables for Unemployed and Employed.In general, if there are p target variables and q auxiliary variables corresponding to sums of target variables, we can write the values of the auxiliary variables as with C the q × p constraint matrix (consisting of zeroes and ones) that generates the sums of the y i values corresponding to the auxiliary variables.Since (9) applies to each row of Z and Y, we can write Z = YC and so In the case considered here, where the target variables correspond to cells in a cross-classification of categorical variables, this expression can be simplified as follows.The rows of Y contain a 1 in the column corresponding to the cell to which the unit belongs and zeroes elsewhere.
After rearranging the rows such that the units that belong to the same cell (score a one on the same target variable) are beneath each other, Y can be written as , where n j is the number of units scoring a one on target variable j and 1 n j is a column with n j ones.In this example there are no units that score on the third target variable.When this matrix is premultiplied by Y Π which is equal to (4) with the initial unadjusted vector (x) equal to the Horwitz-Thompson estimators for the cell-totals, the weighting matrix (A −1 ) a diagonal matrix with the initial vector along the diagonal and the values of the constraints (b) equal to the known population totals of the margins of the contingency table that are used as auxiliary variables.

Extension to time series data
The optimization problem described in 2.1 can be extended for the time series data.Suppose that our data consists of the N variables, each measured at T time points.We define these data x it , (i = 1, . . ., N, t = 1, . . ., T ) as N time series, each of length T .In this case the total number of the variables x it is N • T and the constraint matrix will have N • T columns.The number of rows will be equal to the number of constraints as before.The matrix A will be a symmetric, N T × N T nonsingular matrix.
For this data we want to find adjusted values x it that are in some metric ς (for example Euclidean metric) close to the original time series.For this purpose we consider the following objective function where w it denotes the variance of the i th time series at time t.We minimize this function over all x it satisfying the constraints In ( 14), r is the index of the restrictions and C is the number of restrictions.Furthermore, c rit is an entry of the restriction matrix and b r are fixed constants.Most economic variables cannot have negative signs.To incorporate this (and other) requirement(s) in the model, inequality constraints are included.A set of inequalities is given by where I stands for the number of inequality constraints.
In Bikker et al. (2013) this model was extended by soft linear and ratio restrictions.A soft equality constraint is different from the hard equality constraints ( 14), in that the constants b r are not fixed quantities but are assumed to have a variance and an expected value.This means that the resulting x it need not match the soft constraints exactly, but only approximately.A soft linear constraint similar to ( 14) is denoted as follows: By the notation ∼ in ( 16) we define b r to be the expected value of the sum N i=1 T t=1 c rit x it and w r its variance.In the case that ς is the Euclidean metric the linear soft constraints can be incorporated in the model by adding the following term to the objective function in (13): Another important extension of the model in Bikker et al. (2013) is the ratio constraint.The hard and soft ratio constraints that can be added to the model, are given by where x nt denotes the numerator time series, x dt denotes the denominator time series, v ndt is some predetermined value and w ndt denotes the variance of a ratio xnt x dt .In order to add the soft ratio constraints to the objective function these are first linearized.The soft constraints in (18) can be rewritten as: The variance of the linearized constraint will be different, we denote it as w * ndt .Soft linearized ratios are incorporated in the model in case when ς is a Euclidean metric, by adding the following term to the objective function The inclusion of soft and ratio constraints in a model arises the possibility of handling macroeconomic relations of data variables that were beyond the traditional linear (in)equality constraints.It opens up a possibility to a number of applications to reconciliation problems in several areas.An example of one such application is described in section 4.

Reconciliation of census tables
In this section we describe the Dutch Census data and formulate the reconciliation of census tables as a macro-integration problem.
The aim of Census 2011 is to produce 60 multi-dimensional cross-classifications (we will call these here hypercubes) about demographics and occupation.For each of these hypercubes figures should be produced for the whole Dutch population, for each province and for each municipality.Consisting in the end from a great number of hypercubes.For this task, data from many different sources and different structures are combined.The majority of the variables are obtained from the GBA (population register), however quite a few other sources (sample surveys and registers) are used as well, such as for example the labour force survey (LFS).
Each table consists of up to 10 variables.Most of the variables are included in many hypercubes.The hypercubes have to be consistent with each other, in a sense that all marginal distributions that can be obtained from different crosstables are the same.Consistency is required for one dimensional marginals, e.g. the number of men, as well as for multivariate marginals, e.g. the number of divorced men aged between 25 and 30 year.
In different hypercubes, the same variable may have a different category grouping (classification).For example, the variable age can be requested to be included in different hypercubes aggregated in different levels of detail: groups of ten years, five years and one year.Still, the marginal distributions of age obtained from different hypercubes should be the same for each level of aggregation.
In general, the data that are collected by Statistics Nederlands (SN) involve many inconsistencies; the cause of this varies: different sources, differences in population coverage, different time periods of data collection, nonresponse correction method.
Currently at SN, the method of repeated weighting is used to combine variables from different sources and to make them consistent (Houbiers 2004).Using repeated weighting, tables are reconciled one by one.Assuming that the tables 1 till t are correct, these figures are fixed.
Then, the method of repeated weighting adjusts table t + 1, so that all margins of this table become consistent with the margins of all previous tables, 1 till t.The method of repeated weighting was successfully used for the last census in 2001.However, the number of the tables has increased since and with the number of tables the number of restrictions also increased.
As a consequence, it is not obvious that the method of repeated weighting will work for the Census 2011.
The method of macro-integration has some advantages over repeated weighting.Firstly, the method of macro-integration reconciles all tables simultaneously, meaning that none of the figures need to be fixed during the reconciliation process.By doing so, there are more degrees of freedom to find a solution than in the method of repeated weighting.Therefore a better solution may be found, which requires less adjustment than repeated weighting.Secondly, the results of repeated weighted depend on the order of weighting the different tables, while the macro-integration approach does not require any order.Thirdly, the method of macrointegration allows inequality constraints, soft constraints and ratio constraints, which may be used to obtain better results.
A disadvantage of macro-integration is that a very large optimization problem has to be solved.However, by using up-to-date solvers of mathematical optimization problems, very large problems can be handled.The software that has been built at Statistics Netherlands for the reconciliation of National Account tables is capable of dealing with a large number of variables (500 000) and restrictions (200 000).This software is built around the commercial optimization solver XPRESS.
We should emphasize that reconciliation should be applied on the macro level.First, imputation and editing techniques should be carried out for each source separately on the micro level.The aggregated tables should then be produced, containing variables at the publication level.Furthermore, for each separate aggregated table, a variance of each entry in the table should be computed, or at least an indication of the reliability of the entry should be defined.For example, an administrative source will in general have the most reliable information, and hence have a very high reliability.For the entries where no variance is available, a reliability weight can be defined using the knowledge and experience of the expert matter specialists.
In our case the specialists group the data entries into different reliability classes and assign weights to each class, for a more detailed description see Bikker et al. (2013).During the reconciliation process, each entry of all tables will be adapted in such a way that the entries that are least reliable will be adapted the most, until all constraints are met.
The procedure that we propose here is as follows: 1.For each data source define the variables of interest; 2. Use imputation and editing techniques to improve data quality on a micro level; 3. Aggregate the data to produce the tables, and calculate the variances of each entry; 4. Use reconciliation to make the tables consistent.Calculate the covariance matrix for the reconciled table.
For step 4, we have identified a number of reconciliation problems for the census data: I Some variables will have different classifications, for example the variable Age can be in years, or five year intervals or ten year intervals.It is required that the number of persons obtained from the hypercube with the variable Age with one year intervals for example from 10 to 20 years should add up to the number of persons of this age interval obtained from any other hypercube, where Age is measured in five or ten years intervals.
The objective function and the constraints can be set up to handle this problem.
II Before achieving consistency between all hypercubes we have to estimate each hypercube.We assume that an initial estimate for each hypercube can be made.However, this is not necessarily straightforward, especially in case of hypercubes that include variables from different data sources, for example a register and a sample.In Appendix A we will present a real data example of how one can estimate the hypercubes.
III A problem that has to be solved in any method is the lack of information.Part of the source information is based on samples.However, these samples may not cover each of the categories of the variables in the hypercubes.For instance, a sample may not include any immigrant from Bolivia, while this sample may be the only source for some of the variables in the census.In Daalmans (2013) a solution for this problem is described in more detail.

The objective function
We distinguish two steps while making the census hypercubes: 1.At first the hypercubes should be made from all available sources; 2. Then all hypercubes should be adjusted so that the same margins are equal; Building of the census hypercubes from different sources could be carried out using many different methods, like weighting or post-stratification.In Appendix A we present a simple example of making a hypercube using two different data sources.In this section we will not discuss these methods.From the macro-integration point of view the second step of making the hypercubes is of our interest.
Using the notation from the previous section we can now apply the macro-integration method for reconciliation of the hypercubes by their common marginals.In the previous section we defined the objective function ( 13) using an arbitrary metric.Here we use a Euclidean metric.
We introduce the following notation for census data.For j = 1, . . ., N , a hypercube is defined by H (j) .A marginal hypercube of H (j) will be defined by M (j) .A variable in the hypercube H (j) is defined by x (j) i , where the subindex i denotes the variable, for example Province or Age and the super index (j) identifies the hypercube where the variable is included.For example, if we have two hypercubes H (1) and H (2) , the variables from H (1) will be defined by m , assuming that the hypercube H (1) consists of m variables.Suppose now that the hypercube H (2) consists of n variables and it has three variables x in common with the hypercube H (1) .Denote the marginal hypercube of H (1) consisting of these variables by M (1) 1,2,4 : 4 .
Reconciling the hypercubes H (1) and H (2) so that their common marginal hypercubes are the same will mean the finding of hypercubes H (1) and H (2) such that: reaches its minimum under the condition that: In the case when the first marginal hypercube M (1) 1,2,4 consists of the variables from a register, that are fixed and should not be reconciled, then instead of the condition in (22) we will have the following We can now define the objective function for the reconciliation of the hypercubes H (j) , j = 1, . . ., N .We want to find the hypercubes H (j) , j = 1, . . ., N such that: min under the restriction that, all common marginal hypercubes are the same These marginal hypercubes can include some register variables.However, there is no register data available for the combination of the variables x l .On the other hand, for the marginal hypercubes that consist of a combination of variables for which register data is available, we will have the following restriction: p,q,...,s = • • • = M (jn) p,q,...,s . (26) If we transform the hypercube H (j) into a vector h (j) = (h c j ) we can rewrite the objective function in (24) using the notation of the previous section.For all h (j) , j = 1, ..., N , we want to find vectors h (j) , j = 1, ..., N such that: where w ij is the weight of h (j) i .

Reconciliation of two hypercubes
Suppose we want to create two hypercubes, each with three variables.Hypercube one H (1) consists of variables Gender, Age and Occupation and the second hypercube, H (2) of the variables Gender, YAT (year of immigration) and Occupation.For convenience, we combine the original categories of these variables and consider the coding as presented in Table 1.From these variables the only one that is observed in the survey is Occupation, the other three variables are obtained from the register and are therefore assumed to be fixed.The survey we use here is the LFS (labour force survey) and the register is the GBA (population register).As we mentioned already we assume that the figures obtained from GBA are exogenous, what means that these values should not be changed.
We aim to find the hypercubes H (1) and H (2) such that is minimized under the restrictions that the marginal hypercubes of H (1) and H (2) coincide with the corresponding marginal hypercubes of the register.Hence we want to achieve that: and M (2) Gender, YAT = M register Gender, YAT .
(30)  The results of the weighting are presented in Tables 2 and 3 under the column 0. Since we consider these figures as the starting figures before the reconciliation process, we call these model 0. These figures have marginals consistent with each other but not with the register data, see Table 4.For example, the total number of men is 8214119 from Table 2 and 3 and 8113730 in Table 4.
We applied the optimization solver XPRESS for the problem defined in (28-31) using the Euclidean distance for ς and applying the weight 1 for all figures.The results of this reconciliation are presented in Tables 2 and 3 under the column I.We observed negative figures after the reconciliation, therefore we added the restriction that all figures have to be nonnegative to the previous setting and applied the solver.Results of this optimization problem are presented in Tables 2 and 3 under the column II.Next we used weights equal to the initial value of each figure.The results of this execution are to be found under the column III in Tables 2  and 3. Applying more realistic weights led to different results, compared with models I and Since we want to preserve the initial marginal distribution of the variable Occupation, the next step is to add a ratio restriction.We only added one ratio restriction, that is the relation between the managers and non managers for the whole population.At first we added this restriction as a hard constraint and afterwards as a soft constraint to the model.The results of these reconciliation problems are presented in columns IV and V of Tables 2 and 3.For the soft restrictions the weight we choose is equal to 707405400, which is in the order of 100 times the largest register value.This value is found by trial and error.By choosing this value the ratio constraints significantly influences the results, but its effect is clearly less than that of a hard ratio constraint.In Table 5 the ratios of the number of 'not manager' over the number of 'manager' is calculated for the models III, IV and V.The target value of the ratio is the ratio observed in LFS.As we could expect the value is best achieved in model IV, when the hard ratio restriction has to be fulfilled.
To compare the results of the models with each other we calculated the weighted quadratic difference between the reconciled values of models III, IV and V and the values of model 0, the hypercubes after the weighting, see Table 6.
The weighted squared difference in Table 6 is calculated as follows here we sum over two hypercubes, h i are the reconciled figures of model III, IV or V and h (j) i are the values of model 0. The weighted squared difference is smallest for model III, which implies that without the ratio restriction reconciled figures are closer to the original figures.We could anticipate this result since the ratio restriction (as any additional restriction would do) forces the original figures towards the distribution of the ratio and therefore the outcome of the model with the hard ratio restriction differs most from the initial values.

Reconciliation of turnover figures
The second application of our macro-integration method is reconciliation of turnover figures for short term statistics (STS).Currently Statistics Netherlands is investigating the possibility of using a macro-integration method for this reconciliation.Monthly STS figures are partly based on a sample and partly on full-scale business reports.Small and middle sized businesses are included in the sample and all of the large businesses are approached.From these figures the business statistics department estimates the monthly turnover index for each sector for the Netherlands.On the other hand, we also have quarterly and yearly turnover figures of structural business statistics (SBS) based on register information.The monthly STS figures on a macro level should be consistent with the quarterly SBS figures.This condition should hold for the monthly and quarterly changes.For subject matter specialists the precise values of these figures are less important than the changes.Also, since the monthly and quarterly figures are obtained from different sources, the obvious choice is to consider the changes.In our application, quarterly figures are considered to be reliable and assumed to be fixed.In general, only those quarterly figures will be fixed that are already published.Let us consider three STS monthly series of turnover indices for the industry "household appliances manufacture", see Figure 1.These monthly figures are: the index of total turnover I T m,i , the index of domestic turnover I D m,i and the index of the foreign turnover I F m,i .We consider these figures for nine months, (i = 1, . . ., 9).For each series we have the corresponding quarterly SBS index, defined as I T q,k , I D q,k and I F q,k , (k = 1, . . ., 3).These quarterly values are the benchmarks for our monthly series, since we will take these quarterly turnover indices as fixed.
On the other hand for the quarterly SBS turnover figures, subject matter specialists put the following constraint on the three indices: These equations reflect the relative share of the domestic and foreign turnover in total turnover in the base period.
Here we consider two different approaches for reconciliation of the STS series.In the first approach we assume that the monthly total turnover index has been adjusted pro rata already and we will apply a macro-integration method to reconcile domestic and foreign monthly indices.In the second approach we will take the original figures of the monthly total turnover index and apply a macro-integration method to reconcile the three time series (total, domestic and foreign turnover) simultaneously.

Pro rata approach
Suppose that the monthly total turnover figures I T m,i are adjusted pro rata, and the pro rata estimate I T m,i satisfies the following constraint: So we have original figures of monthly domestic and foreign turnover indices and pro rata adjusted figures of the total turnover indices, see Table 7.The quarterly figures are given in Table 8.We want to find the estimates I D m,i and I F m,i of our monthly series such that: 1. Monthly changes of domestic and foreign turnover indices are preserved as much as possible; 2. Average of monthly domestic and foreign turnover indices are equal to the corresponding quarterly turnover index; 3. All quarterly figures and monthly total turnover figures are fixed and monthly figures of domestic and foreign turnover indices can be adjusted; 4. For each month, the following constraints should hold: We can now specify the objective function for this problem.We assume here that the metric ς in ( 13) is the Euclidean metric: min under the constraints defined in ( 35) and ( 36).Here v D and v F denote the weights of the series I D m,i and I F m,i , respectively.In this example we have two kinds of hard constraint: within the same time period and over three time periods.We have no soft constraints.The first term in (37) will guarantee that the monthly changes I D m,i − I D m,i−1 is preserved as much as possible and the second term serves the same purpose for I F m,i series.We consider two different pairs of weights for the monthly series.Accordingly, we have two different scenarios for the data integration problem.At first we assume that both series have the same weights equal to 1.In the second scenario we assume that the weights for the domestic turnover series is equal to 1 and the weight of the foreign turnover is equal to 0.1.
Using the statistical software package R we programmed an iterative algorithm described in e.g.De Waal, Pannekoek, and Scholtus (2011), Ch. 10 to solve the lineair optimization problem defined in ( 35)-( 37), with the weights v D = v F = 1 for scenario 1 and v D = 1 and v F = 0.1 for scenario 2. To illustrate the preservation of changes we present the original and the reconciled series separately for domestic and foreign turnover in Figures 2 and 3. Observe that in scenario 1 both time series are equally reliable.However, in scenario 2 we assume that the foreign turnover figures are more reliable than the domestic turnover.As a result, the reconciled foreign turnover figures in scenario 2 have much better preserved monthly changes than the domestic turnover figures.Using the weights we can include extra information in the model.If in our example we know that the source for one series are more reliable than the other series, we can include this information in the model by adapting the weights.

Macro-integration approach
If instead of the pro rata adjusted series we consider the original figures for the total turnover and include these series in the objective function as well, we will obtain a new integration problem.In this case we will have three time series to benchmark.We want to find the estimates of the series I T m,i , I D m,i and I F m,i such that: min Here v T denotes the weight of the series I T m,i .In the previous subsection we first adjusted the monthly total turnover figures pro rata.In the optimization problem ( 35)-(37) these figures were fixed.In this example we do not adjust the total turnover figures beforehand, we want to reconcile these simultaneously with the other figures.Therefore for this problem, (36) should change into constraints that include the estimates of the total turnover indices: In addition, constraints in (35) should now also hold for I T m,i , the estimates of the total turnover indices: For the macro-integration problem in ( 38)-( 40) we defined two different scenarios according to the weights of the series.In the first scenario we consider the following weights: And for the second scenario: v T = v F = 0.1 and v D = 1.
The estimates for these two scenarios were almost identical, see for example I T 1 and I T 2 in Table 9.It seems that the optimal estimates were found and the weight did not make much of a difference.
Remark In Figure 4 we compare the original figures of the total turnover with the adjusted figures from two different approaches described above.Adjusted figures are according to the macro-integration method, scenario 1 and the pro rata method.We can observe that Figure 4: Original and adjusted monthly total turnover figures the estimate obtained by the macro-integration method follows the monthly changes of the original time series better than the pro rata adjusted estimate, even though the difference between these estimates is minor.From the two methods described above, we would suggest to use the full macro-integration method.It has several advantages: 1.The estimated time series of the total turnover follow the monthly changes of the original series; 2. The original figures of the total turnover do not have to be adjusted beforehand, implying that the integration process incudes one step less.
3. The choice of the weights could become less important and this may lead to better estimates.
This example illustrates the use of a macro-integration method for time series STS data.SN is currently carrying out research on how to apply macro-integration of STS figures in the production process.

Conclusions
Reconciliation of tables on a macro level can be very effective, especially when a large number of constraints should be fulfilled.Combining data sources of different structures on a macro level is often easier to handle than on a micro-level.When data are very large and many sources should be combined, macro-integration seems to be the only technique that is effective.Macro-integration is also more versatile than (re-)weighting techniques using GREG-estimation in the sense that inequality constraints and soft constraints can be incorporated easily.
The two examples considered in this paper are of great importance for SN: For the census, further developing the macro-integration approach is very important, since the application of the repeated weighting method at SN is currently already hampered by its limitations.For this reason SN is using a combination of the weighting method and the macro-integration method.We feel that in the future macro-integration could be the only method used to ensure consistency of tables on macro level.
The second application is equally, if not more, important for SN.For the past couple of years, SN has had an additional data source for business statistics figures.The use of register data has increased considerably over the last years.Also the quality of data has improved, and through intensive communication between SN and the registers, our knowledge of the register variables has increased.At the same time, SN has taken measures to improve the quality of the surveys.Improving the quality of the monthly and quarterly data creates the possibility for reconciliation of the survey data with the register data.At this moment SN is taking steps to implement this reconciliation.

Figure 1 :
Figure 1: Monthly turnover figures for household appliances manufacture

Figure 2 :
Figure 2: Original and reconciled domestic turnover figures

Figure 3 :
Figure 3: Original and reconciled foreign turnover figures

Table 1 :
Categories of the variables

Table 2 :
Hypercube 1The first step before the actual reconciliation process is weighting up the sample to the population.The total number of GBA persons is N GBA = 16 408 487 and the total number of LFS persons is N LF S = 104 674.The initial weight is

Table 7 :
Monthly turnover indices of industry "household appliances manufacture"

Table 8 :
Quarterly turnover figures

Table 9 :
Reconciled figures of total, domestic and foreign turnover for scenarios 1 and 2.