Covariance Structure of Compositional Tables

Recent experience with interpretation of orthonormal coordinates in compositional data shows clearly a necessity of their better understanding in terms of logratios that form the primary source of information within the logratio methodology. This is even more crucial in the special case of compositional tables, where both balances and coordinates with odds ratio interpretation are involved. The aim of the paper is to provide a decomposition of covariance structure of orthonormal coordinates in compositional tables in terms of logratio variances, which could serve this purpose. For their better interpretability, the formulas are also accompanied with appropriate comments and graphical illustrations, and implications for the prominent case of 2 × 2 compositional tables are discussed.


Introduction
Although the logratio methodology seems to be nowadays a well-established approach to statistical analysis of compositional data, i.e. multivariate observations carrying relative information (Aitchison 1986;Pawlowsky-Glahn and Buccianti 2011), it is also suitable for more complex data structures, where not the absolute values but rather ratios are of primary interest.One of them are compositional tables (Egozcue, Díaz-Barrero, and Pawlowsky-Glahn 2008;Egozcue, Pawlowsky-Glahn, Templ, and Hron 2014;Fačevicová, Hron, Todorov, Guo, and Templ 2014a;Fačevicová, Hron, Todorov, and Templ 2014b), a continuous counterpart to well-known contingency tables (Agresti 2002).Besides the difference in the nature of data, (cells of the contingency table are discrete counts, while parts of the compositional tables are continuous values), the main difference between is that, on the one hand, a contingency table collects results from n independent observations, while, on the other hand, a compositional table itself represents one observation.The analysis of relationships between row and column factors is thus based on sample of n compositional tables.Furthermore, by applying the Aitchison geometry (Egozcue and Pawlowsky-Glahn 2006), and following the principles of compositional data analysis, it is possible to decompose the original table into its independent and interactive parts (the latter capturing relations between both factors), while assuming geometric marginals instead of the standard arithmetic ones.Moreover, in Fačevicová et al. (2014b) orthonormal coordinates were assigned to both independence and interaction tables that enable to perform statistical analysis using standard methods and focus only on coordi-nates of the interaction table.The problem of analysis of relationship between factors from a sample of tables, which would need to be handled using three-dimensional contingency tables or log-linear models in the standard case, thus can directly transferred to standard statistical treatment (like hypotheses testing) in coordinates; for example, independence of factors corresponds to zero coordinates of the interaction table .While coordinates of the independence table can be interpreted through balances (Egozcue and Pawlowsky-Glahn 2005), coordinate representation of the interaction one needs to be formulated in sense of odds ratios (Fačevicová et al. 2014b).Although motivation for the latter coordinates is quite intuitive as odds ratios become popular to represent also contingency tables (Agresti 2002), interpretation of the coordinate system as a whole may seem to be too complex for practical purposes.On the way to enhance the interpretability, one possibility is to analyze covariance structure of coordinates (Fišerová and Hron 2011) to see which ratios contribute (in positive or negative sense) to values of individual variances and covariances.Nevertheless, a specific structure of compositional tables and their respective coordinates requires a deeper insight as balances form just one part of the coordinate system.
The aim of the presented paper is to analyze covariance structure of orthonormal coordinates for compositional tables in terms of elements of the variation matrix (Aitchison 1986), i.e., as linear combinations of variances of single logratios, which seems to be a necessary step in further development of any reasonable coordinate representation of compositional tables.The paper is organized as follows.In the next section, basics of compositional tables and their decomposition into independent and interactive parts are recalled.Section 3 is devoted to the covariance structure of coordinates itself, where the corresponding formulas (that might seem to be rather complex) are always illustrated with a graphical scheme to allow their better understanding.In Section 4 some implications for the special case of 2×2 compositional tables are briefly mentioned.Section 5 presents an illustrative example and Section 6 concludes.

Orthonormal coordinates of I x J compositional tables
A I × J compositional table x is a special case of compositional data that are arranged into a form of table to indentify relation between two (row and column) factors.They are formed by parts x ij > 0 for i = 1, 2, . . ., I and j = 1, 2, . . ., J which carry only relative information.Consequently, their sum κ is arbitrary (like κ = 1 for the case of proportions), reached formally using the closure operation .
The sample space of representations of I × J compositional tables is the simplex, (IJ − 1)dimensional subset of R IJ defined as To follow specific features of compositional tables (as a special case of compositional data), the Aitchison geometry on the simplex is defined, see Egozcue and Pawlowsky-Glahn (2006) for details.This geometry has the same algebraic-geometrical structure as the standard Euclidean geometry in real space and is represented by operations of perturbation, power transformation, and the Aitchison inner product.According to Egozcue et al. (2008) these operations are defined for compositional tables x and y and α ∈ R as Compositional table n = C (x ij = 1) I,J i,j=1 stands for the neutral element in the (IJ − 1)dimensional vector space (S IJ , ⊕, ).
Consequnetly, the following relations between the Aitchison and the Euclidean geometries can be derived, i.e. h is an isometric mapping from S IJ to R IJ−1 (we refer also to isometric logratio (ilr) transformation (Egozcue, Pawlowsky-Glahn, Mateu-Figueras, and Barceló-Vidal 2003)).
Within the framework of the and the interactive part (interaction table) . (2) According to Fačevicová et al. (2014b), the independence and the interaction tables can be expressed in I + J − 2 and (I − 1)(J − 1) nonzero orthonormal coordinates, respectively, the remaining coordinates (up to the total number of IJ −1 variables) being zero.The coordinates of the independence table can be expressed as balances and representing the row and column information (logratios), respectively, conveyed by the independence table; coordinates of the interaction table can be chosen as with an odds ratio structure.These two sets of coordinates together form an orthonormal coordinate representation of the original compositional table x.Covariance structure in terms of elements of the variation matrix (Aitchison 1986), especially for coordinates of the interaction table, will be studied in detail in the next section.

Covariance structure of coordinates of the compositional table
In the following, covariance structure of the above mentioned coordinate representation will be expressed as linear combinations of variances of logratios.At first, covariance structure of the interaction table is introduced, followed by the independence table structure, and finally also mutual relations between both tables (expressed through the corresponding covariances) are analyzed.
Variances of logratios form the elemental information on variability in compositional tables and are summarized in IJ × IJ variation matrix x 11 var ln x 11 x 12 As it is usual within the logratio methodology, all coordinates are logcontrasts, i.e. they can be expressed in form Also the covariance structure can be derived accordingly (Aitchison 1986).
Proposition 3.1 Variances and covariances for logcontrasts a ln x and b ln x of a IJ-part Since the possible logcontrast representation of coordinates (2), ( 3) and (4), Equations ( 7) and ( 8) are crucial to derive of their covariance structure.As the interaction table is usually of main interest for the analysis, we start with variances of its respective coordinates.
Theorem 3.2 Consider an arbitrary coordinate z rs , for r = 2, . . ., I and s = 2, . . ., J of the interaction table x int from (5).Its variance is formed by three parts, The first part, increasing the variance, is The variance of the coordinate is reduced by parts and Proof: When parts of the compositional table x are rearranged in form of composition x r = (x 11 , x 12 , . . ., x 1J , x 21 , . . ., x IJ ), coordinate z rs of the interaction table can be expressed as z rs = a ln x r , where for elements of the coefficient vector a = (a 11 , a 12 , . . ., a 1J , a 21 , . . ., a IJ ) the following relations hold, Equation ( 9) is then consequence of Proposition 3.1.
From Theorem 3.2 it is clear that variance of the coordinate z rs is formed by nine groups of logratio variances.Four of them increase the overall variability and the other five reduce it.
The first four groups are represented by A 1 , which is formed by logratios of "inner" parts of the partial table or part x rs , with its last row and column (i.e.r-th row and s-th column of the original table x) except of the part x rs itself: • variances of logratios between an inner part of the partial table and a part from its last row (except of x rs ), • variances of logratios between an inner part of the partial table and a part from its last column (except of x rs ), • variances of logratios between a part from the last row (except of x rs ) and x rs itself, • variances of logratios between a part from the last column (except of x rs ) and x rs itself.
The variance of z rs is reduced by B 1 and C 1 , formed by variances of logratios corresponding to remaining possible relations between parts of the above defined groups (inner tables, last row/column without x rs , part x rs itself).Concretely, B 1 consists of

• variances of logratios between inner parts of the partial table,
• variances of logratios between an inner part and x rs .
Similarly, C 1 is formed by • variances of logratios between parts from the last row (except of x rs ), • variances of logratios between parts from the last column (except of x rs ), • variances of logratios between parts from the last row and the last column (except of x rs ).
The above relations can be expressed also graphically, as shown in Figure 1.
Covariances between coordinates of the interaction table are derived in the next theorem.Theorem 3.3 Consider two coordinates of the interaction table z r 1 s 1 , z r 2 s 2 , for r 1 , r 2 = 2, . . ., I and s 1 , s 2 = 2, . . ., J. Then for their covariance the following holds, where var ln var ln var ln var ln B 2 = (s 1 − 1) var ln var ln var ln var ln var ln var ln and .
Proof: The covariances are obtained using the general formula ( 8), where the corresponding coefficient vectors a 1 and a 2 have elements and k = 1, 2.
Similarly as for the case of variances, there is a group of logratio variances that increases the overall covariance between coordinates (A 2 and B 2 ) and the remaining variances reduce it (C 2 and D 2 ).Specifically, for construction of logratios in A 2 the following parts are employed, • an inner part of the first partial table and a part from the last column of the second partial table (except of x r 2 ,s 2 ), • an inner part of the first partial table and a part from the last row of the second partial table (except of x r 2 ,s 2 ), • the part x r 1 ,s 1 and a part from the last column of the second partial table (except of x r 2 ,s 2 ), • the part x r 1 ,s 1 and a part from the last row of the second partial table (except of x r 2 ,s 2 ), where we always deal with two "virtual" tables corresponding to the coordinates of interest.
Similarly, B 2 is formed by variances of logratios of • a part from the last column of the first partial table (except of x r 1 ,s 1 ) and an inner part of the second partial table, • a part from the last column of the first partial table (except of x r 1 ,s 1 ) and the part x r 2 ,s 2 , • a part from the last row of the first partial table (except of x r 1 ,s 1 ) and an inner part of the second partial table, • a part from the last row of the first partial table (except of x r 1 ,s 1 ) and the part x r 2 ,s 2 .
On the other hand, the covariance is reduced by C 2 , involving logratios between • an inner part of the first partial table and an inner part of the second partial table, • an inner part of the first table and the part part x r 2 ,s 2 , • the part x r 1 ,s 1 and an inner part of the second partial table, • parts x r 1 ,s 1 and x r 2 ,s 2 , and by D 2 consisting of logratios, formed by • a part from the last column of the first partial table (except of x r 1 ,s 1 ) and a part from the last column of the second partial table (except of x r 1 ,s 1 ), • a part from the last column of the first partial table (except of x r 1 ,s 1 ) and a part from the last row of the second partial table (except of x r 2 ,s 2 ), • a part from the last row of the first partial table (except of x r 1 ,s 1 ) and a part from the last column of the second partial table (except of x r 2 ,s 2 ), • a part from the last row of the first partial table (except of x r 1 ,s 1 ) and a part from the last row of the second partial table (except of x r 2 ,s 2 ).
Also covariance between two coordinates of the interaction table could supported by its graphical representation, see Figure 2.
Since coordinates of the independence table (3), ( 4) are balances obtained from sequential binary partitions, dividing rows and columns of the original table, respectively (Egozcue and Pawlowsky-Glahn 2005), their variances and covariances are obtained as direct consequence of Fišerová and Hron (2011).
The variances of these coordinates are enlarged by variances of logratios between a part from the k-th row/l-th column and any part from the subsequent rows/columns.On the other hand, the variances of z r k and z c l are reduced by variances of logratios between parts from the same row/column.
According to relation (8) there are three main options how to get covariance between coordinates of the independence table, depending on concrete balances of interest.All these possible covariances are summarized in the following theorem.
To complete the covariance structure of coordinates of the compositional table x, covariances between coordinates of the interaction and independence tables are necessary.They are provided in the last theorem.
Theorem 3.6 Consider coordinate of the interaction table z rs , for r = 2, . . ., I and s = 2, . . ., J, and two coordinates of the independence table, z r k , for k = 1, . . ., I − 1, and z c l , for l = 1, . . ., J −1.Then for covariances between coordinates of the interaction and independence tables the following hold, where for where for Proof: The assertion of the theorem is a direct consequence of Proposition 3.1 and Equations ( 3), ( 4) and ( 5).
Similarly as for the case of interaction table, also the above results could be interpeted graphically.Because Theorems 3.4 and 3.5 represent a special case of balances, that were in detail analyzed in (Fišerová and Hron 2011), in Figure 3 we focus just on covariances, resulting from Theorem 3.6.

Implications for 2 x 2 compositional tables
In practice, 2 × 2 compositional (and also contingency) tables represent a prominent special case that requires a special treatment (Fačevicová et al. 2014a;Agresti 2002).From Equations (3), ( 4) and ( 5) it is easy to see that for coordinate representation of the compositional table it is sufficient to consider the following coordinates, In other words, it means that zero covariances can be easily expressed in terms of logratio variances.Consequently, the above relations could be used, e.g., by designing simulation settings for 2 × 2 compositional tables using elements of the variation matrix as a source of elemental information in covariance structure of compositional tables.
Following Fačevicová et al. (2014a), it is possible to assign also another system of orthonormal coordinates to a 2 × 2 compositional table.Specificaly, we get for the interaction and independent tables, respectively, and the covariance structure changes as follows, In other words, it means that var ln x 12 x 21 and var ln x 11 x 22 are influential just for variances of coordinates z ind 1 , z ind 2 , z int , forming also natural constraints for their possible values.By comparing with the corresponding elements of the variation matrix we can conclude that none of logratios contributes exceptionally (in the positive sense) to variability of the coordinate.In the negative sense, the logratio ln(underweight or normal weight in age 45-64/overweight or obesity in age 25-44 ) shows a dominant effect.Similarly, also other variances and covariances can be derived (and further analysed for structural patterns), resulting in a covariance matrix var(z) = .

Numerical example
Finally, note that by considering both markedly nonzero means of coordinates of the interaction table (first two elements of the vector z) and their corresponding small variances, we can conclude that, based on the considered sample, age and BMI index are not dependent.

Discussion
Recent experience with orthonormal coordinates for compositional data (Reimann, Filzmoser, Fabian, Hron, Birke, Demetriades, Dinelli, and Ladenberger 2012;Filzmoser and Walczak 2014) shows clearly the necessity of their better understanding in terms of logratios, which could be achieved also by decomposing the corresponding covariance structure.This is even more crucial for compositional tables, where both balances and coordinates with odds ratio interpretation are involved.Obviously, due to complex character of the above formulas for covariance structure in compositional tables, they will be rather rarely used for practical computations.Therefore, the formulas are also accompanied with comments and graphical illustrations to better understand their logical structure that is much more important for the aim of the paper.Consequently, similarly as for the case of balances (Fišerová and Hron 2011), we are convinced that decomposition of variances and covariances as linear combinations of logratio variances enhances interpretability of coordinates of compositional tables, using logratios as the primary source of information in compositional data.

Figure 1 :
Figure 1: Variance of coordinate z rs is increased by variances of logratios between a part from area highlighted by (/) and a part from the second area highlighted by (\) -A 1 .The variance of z rs is reduced by variances of logratios between two parts from area (/) -B 1 or two parts from (\) -C 1 .

Figure 2 :
Figure 2: Covariance between coordinates z r 1 ,s 1 and z r 2 ,s 2 is increased by variances of logratios between a part of the first partial table from area highlighted by (/) and a part of the second partial table from area highlighted by (|) -A 2 .The second group of variances increasing the covariance between coordinates are connected to logratios between parts from (\) and (−) areas -B 2 .The covariance is reduced by variances of logratios between parts from (/) and (−) area -C 2 or two parts from (\) and (|) -D 2 .
computed using expression (3), and three coordinates z c l 1 , z c l 2

Figure 3 :
Figure 3: Covariance between a coordinate of the interaction table, z rs (left), and coordinates of the independence table, z r k (middle) or z c l (right), is increased by variances of logratios between parts from areas (/) and (−), or (\) and (|), respectively, and reduced by variances of logratios between parts from areas (/) and (|), or (\) and (−), respectively.
Aitchison geometry it is possible to decompose the original compositional table into its independent and interactive parts, x = x ind ⊕ x int , see Egozcue et al. (2008) for details.The independent part (independence table) is compositional table with elements Now, although coordinates of the independent table are formed just by (scaled) logratios, the covariance structure becomes more complex than before.For example, coordinates (32) are mutually uncorrelated (independent) if, and only if

Table 1 :
Fačevicová et al. (2014b)ed theoretical outputs, let us consider the sample of eighteen 2 × 3 compositional tables, each reflecting population structure in European country according to age and BMI index ((weight in kg)/(height in m) 2 ), with values 25 − 44, 45 − 64, 65 − 84 and under-or normal weight and overweight or obesity, respectively.The data set is an aggregated version of data fromFačevicová et al. (2014b).Table1shows an example of compositional table from the sample.Structure of population in Austria in 2008 according to age and BMI index (in For example, using this matrix and equation (9), variance of the first coordinate of the interaction table, z 22 , can be obtained as var(z rs ) = 1 4 var ln x 11 x 21 + 1 4 var ln x 11 x 12 + 1 4 var ln x 21 x 22 + 1 4 var ln x 12