Taxicab Correspondence Analysis and Taxicab Logratio Analysis: A Comparison on Contingency Tables and Compositional Data

In this paper, we attempt to see further by relating theory with practice: First, we review the principles on which three interrelated well developed methods for the analysis and visualization of contingency tables and compositional data are erected: Correspondence analysis based on Benz´ecri’s principle of distributional equivalence, Goodman’s RC association model based on Yule’s principle of scale invariance, and compositional data analysis based on Aitchison’s principle of subcompositional coherence. Second, we introduce a novel index named intrinsic measure of the quality of the signs of the residuals for the choice of the method. The criterion is based on taxicab singular value decomposition, on which the package TaxicabCA in R is developed. We present a minimal R script that can be executed to obtain the numerical results and the maps in this paper. Third, we introduce a ﬂexible method based on the novel index for the choice of the constant to be added to contingency tables with zero counts so that logratio methods can be applied.


Introduction
We start by citing Tukey (1977), p.400 : the general maxim-it is a rare thing that a specific body of data tells us clearly enough how it itself should be analyzed-applies to choice of reexpression for two-way analysis.
We consider correspondence analysis (CA) and logratio analysis (LRA) as two different popular well-developed choices of re-expression for the analysis and visualization of a contingency table (two-way frequency counts data having I rows and J columns) or a compositional data set (I individuals, also named samples, of J compositional parts).

Motivating example
Consider the contrived data sets of size 3 by 3, X 1 and X 2 reproduced below from Table  9, set 1 of Goodman (1996), where he compared different methods, including CA and LRA. From a mathematical point of view, the most important aspect is that the rank of each data set is 2 or 3 depending on the method used: by the parsimony principle of Occam's razor, of course one should prefer the method that reduces the rank of the data set. For, X 1 =   400 100 100 100 50 100 100 100 400   we have, rank X 1 =2 by LRA and rank X 1 =3 by CA. And for, X 2 =   400 100 100 100 40 100 100 100 400   we have, rank X 2 =3 by LRA and rank X 2 =2 by CA.
It is important to note that the structure of real observed data differs fundamentally from the structure of artificial data. So we need a criterion for the choice between CA or LRA.

Aims and organization
The essay entitled "On Practice" by Mao (1938) complements Tukey's above maxim, because its subtitle is "On the Relation Between Knowledge and Practice, Between Knowing and Doing". For a modern interpretation of Mao's cited work, see Zizek (2008).
Our aim is to relate theory with practice. Given that knowledge-knowing-theory-method start with principles, first we discuss the three basic principles, which are the starting points for each of the three methods: CA, RC association models and CoDA. Then, for data analysis we introduce and apply a simple intuitive criterion for the choice of the preprocessing, and consequently of the method. The novel criterion is the intrinsic variability measure of a principal dimension by the quality of the signs of the residuals (QSR), recently introduced by Choulakian (2021). For each principal dimension QSR is calculated via taxicab singular value decomposition (TSVD) on which the package TaxicabCA in R is developed by Allard and Choulakian (2019). Furthermore, we use the QSR for the choice of the constant to be added to zero cells in contingency tables so that LRA can be applied. This paper is organized as follows: Section 2 discusses the fundamental differences between contingency tables and compositional data via the role of simplices; section 3 presents an overview of taxicab singular value decomposition (TSVD); section 4 presents the computational steps of the methods TCA (taxicab correspondence analysis) and TLRA (taxicab logratio analysis) for contingency tables, then for compositional data; section 5 reviews the three principles; section 6 introduces the QSR index; section 7 presents examples. Finally we conclude in section 8. The appendix presents a minimal R script to execute the computations in this paper.

Simplices in contingency tables and in compositional data sets
From a statistical point of view there is a fundamental difference between the structures of a two-way contingency table N = (n ij ) and a compositional data set X = (x ij ) for i = 1, ..., I and j = 1, ..., J; while from a mathematical point of view the form of the resulting equations arising from different departure assumptions may be identical.

One simplex in a compositional data set
In a compositional data set, I represents the sample size, that is, the number of individuals observed; where each individual is described by J nonnegative variables (discrete or continuous), named parts measured in the same unit, such as ounce or gram, $ or e; and the proportion of each part, y ij = x ij / j x ij is important. For example, suppose we note the expenses of one day stay in Paris and New York, and measure our expenses on three items: one-night stay in Holiday Inn, the price of one cup of coffee at Starbucks and the price of a Big Mac at McDonald's in each of the cities. So we have 2 vectors with 3 parts, where the measurement unit in Paris is in euros and in New York in dollars; x ij in $ or e are not comparable, but the proportions are comparable via logratios. So geometrically, the i-th row is found in the unit simplex S J = y ij |0 ≤ y ij and j y ij = 1 .

Three simplices in a contingency table
In a contingency table N = (n ij ), n = i,j n ij represents the sample size, that is, the number of individuals observed; where each individual is described by two categorical variables: a row categorical variable having I levels and a column categorical variable having J levels. There are three unit simplices in a contingency table: representing the joint probability measure on R IJ + ; representing the conditional probability measure defined on the level i of the row variable; representing the conditional probability measure defined on the level j of the column variable. CA acts simultaneously on the two simplices S J and S I via Benzécri's chi-squared metrics. While TCA acts simultaneously on the three simplices S IJ , S J and S I via TSVD.
A contingency table N can also be represented (coded) as an indicator matrix Z = Z I Z J = [(z αi ) (z αj )] of size n by (I + J), where z αi = 0 if individual α does not have level i of the row variable, z αi = 1 if individual α has level i of the row variable; z αj = 0 if individual α does not have level j of the column variable, z αj = 1 if individual α has level j of the column variable. Note that N = (Z I ) Z J . CA of Z is mathematically equivalent to CA of N; but the geometric structure of Z is not clear, see Greenacre and Hastie (1987) . TCA of Z is not mathematically equivalent to TCA of N, see Choulakian (2008a).

An overview of taxicab singular value decomposition
Consider a matrix X of size I × J and rank(X) = k. Taxicab singular value decomposition (TSVD) of X is a decomposition similar to SVD of X, see Choulakian (2006Choulakian ( , 2016. For a vector u = (u i ), its taxicab or L 1 norm is ||u|| 1 = i |u i |, the Euclidean or L 2 norm is ||u|| 2 = ( i |u i | 2 ) 1/2 and the L ∞ norm is ||u|| ∞ = max i |u i |.
The variational definitions of the TSVD at the α-th iteration are The α-th principal axes are and the α-th principal projections of the rows and the columns are Furthermore, the following relations are also useful where sign(.) is the coordinatewise sign function, sign(x) = 1 if x > 0, and sign( The α-th taxicab dispersion measure δ α can be represented in many different ways The (α + 1)-th residual matrix is An interpretation of the term a α b α /δ α in (6) is that, it represents the best rank-1 approximation of the residual matrix X α , in the sense of the taxicab matrix norm (1).
Thus TSVD of X corresponds to the bilinear decomposition a decomposition similar to SVD, but where the vectors (a α , b α ) for α = 1, ..., k are conjugate, that is In the package TaxicabCA in R, the calculation of the principal component weights, u α and v α , are accomplished by three algorithms. The first one, based on complete enumeration equation (2), is named exhaustive. The second one, based on iterating the transition formulae (3,4), is named criss-cross. The third one, based on the genetic algorithm, is named genetic. The exhaustive algorithm is only feasible when I or J is small (by default, TaxicabCA does not use this algorithm if I and J are > 20 ). The speed of the other algorithms as a function of table's dimensions I by J has not been studied yet. In the criss-cross approach, the starting values are based on the shorter representation of the singular value decomposition (SVD) values. For the interested reader, here are few published references on the sizes of "relatively" large data sets: A health data set of size 3530 by 88 in Choulakian, Allard, and Simonetti (2013); an ecological abundance data of size 285 by 220 in Choulakian (2017); an indicator data set of size 441 by 105 in Choulakian and de Tibeiro (2013); a survey data set of size 88000 by 120 in Das, Minjares-Kyle, Wu, and Henk (2019) and Das (2021).

Analysis of contingency tables and compositional data
First we consider contingency tables, then relate the mathematics to compositional data sets.
Let P = N/n = (p ij ) of size I ×J be the associated correspondence matrix (probability table) of a contingency table N. We define as usual p i+ = J j=1 p ij , p +j = I i=1 p ij , the vector r = (p i+ ) ∈ R I , the vector c = (p +j ) ∈ R J , and D I = Diag(r) the diagonal matrix having diagonal elements p i+ , and similarly D J = Diag(c). We suppose that D I and D J are positive definite metric matrices of size I × I and J × J, respectively; this means that the diagonal elements of D I and D J are strictly positive.

Independence of the row and column categories
a) The I row categories are independent of the J column categories, when where σ ij is the residual matrix of p ij with respect to the independence model p i+ p +j .
Remark 1. Note that, σ ij can be interpreted as the cross-covariance between the i-th row and the j-th column categories using the indicator matrix Z discussed in section 2.
b) The independence assumption σ ij = 0 can also be interpreted in another way as ∆ = (∆ ij ) this is the column and row homogeneity models. (Benzécri (1973a), p.31) named the conditional probability vector ( p ij p +j for i = 1, ..., I and j fixed) the profile of the j-th column; and the element p ij p i+ p +j the density function of the probability measure (p ij ) with respect to the product measure p i+ p +j . The element p ij p i+ p +j is named Pearson ratio in Goodman (1996) and (Beh and Lombardo 2014, p.123).
c) A third way to represent the independence assumption (9) and the row and column homogeneity models (10) is via the (w R i , w C j ) weighted loglinear formulation, equation (11), assuming p ij > 0 and defining G ij = log(p ij ), , are a priori fixed or data dependent probability weights. Two popular weights are marginal (w C j = p +j , w R i = p i+ ) and uniform (w C j = 1/J,w R i = 1/I). This is implicit in equation 7 in Goodman (1996) or equation 2.2.6 in Goodman (1991); and explicit in Egozcue, Pawlowsky-Glahn, Templ, and Hron (2015).
Equation (11) is equivalent to the logratios log( p ij p i 1 j 1 p ij 1 p i 1 j ) = 0 for i = i 1 and j = j 1 , which (Goodman 1979, equation 2.2) names it "null association" model.
Equation (11) is also equivalent to from which we deduce that : under the independence assumption the marginal row probability vector (p i+ ) is proportional to the vector of weighted geometric means (exp(G i+ )), and a similar property is true also for the columns; see for instance Egozcue et al. (2015).

Interaction factorization
Suppose that the independence-homogeneity-null association models are not true; then each of the three equivalent model formulations (9,10,11) can be generalized to explain the nonindependencenonhomogeneity-association, named interaction, among the I rows and the J columns by adding k bilinear terms, where k = rank(N)−1. We designate any one of the interaction indices (9,10,11) by τ ij . (Benzécri 1973a, Vol.1, p. 31-32) emphasized the importance of row and column weights or metrics in multidimensional data analysis; this is the reason in the French data analysis circles any study starts with a triplet ( TCov, TCA and TLRA are factorizations of the interactions in (9,10,11) in three steps: Step 1 : We double-center Y = (τ ij ) That is This double-centering step is necessary to have the important basic equations (17,18) on which the QSR index is based.
Step 2 : Calculate TSVD of X = (τ ij m r i m c j ) as described in section 3 We name (a α (i), b α (j)) Taxicab contribution scores because they satisfy following (5) Furthermore, they are centered following Step 1 And they are conjugate ( in TSVD conjugacy replaces orthogonality in SVD) , so that at iteration α, S ∪S = I is an optimal partition of I and T ∪ T = J is an optimal partition of J. Besides (14), the taxicab dispersion δ α will additionally satisfy the following useful equations: which tells that the Taxicab principal dimensions are balanced ; and which tells that the α-th principal dimension divides the residual data matrix X α into 4 balanced quadrants, see Choulakian and Abou-Samra (2020).
Step 3 : Calculate Taxicab principal factor scores (f α (i), g α (j)) of X by dividing each term in (13) by the weights m r i m c (19) is named "data reconstruction formula".
Remark 2. 1) We have four particular methods: a) Y = (τ ij ) = (σ ij ) and (M I , M J ) = (Diag(1/I), Diag(1/J)), we get TCov analysis, known also as interbattery analysis first proposed by Tucker (1958); later on, Tenenhaus and Augendre (1996) reintroduced it within correspondence analysis circles, where they showed that the Tucker decomposition by SVD produced on some correspondence tables more interesting structure, more interpretable, than CA.
The TCov and uwTLRA principal factor scores are uniformly weighted TCA and mwTLRA principal factor scores are marginally weighted What is the consequence to this? The answer is: Benzécri's principle of distributional equivalence, see Definition 4 below, which states that TCA and mwTLRA results are not changed if two proportional columns or rows are merged into one. This has the practical consequence that the effective size of sparse and large data sets can be smaller than the observed size; see Example 2 in section 7. For further details concerning sparse contingency tables see Choulakian (2017).
2) Aitchison (1983Aitchison ( , 1997 and Aitchison and Greenacre (2002) presented principal components analysis of a compositional data set by applying SVD to the preprocessed data of equation (11) using uniform weights. According to Greenacre and Lewi (2009), the marginally-weighted LRA, named spectral mapping, originates by Lewi (1976).
This shows that, from a mathematical point of view, the form of the resulting equations arising from two different departure assumptions may be identical in Goodman's RC association models for contingency tables and in Aitchison's principal components analysis of a compositional data.
3) In all methods, the symmetric maps are obtained by plotting (f α (i), f β (i)) or (g α (j), g β (j)) for α = β. Gower (2011) enumerates nine tools of interpretation of maps-biplots, one of them being the 'nearness' of points. In case of TCA (or CA), the same idea is explicitly expressed by (Benzécri 1966, p. 56) as: "que deséléments distributionnellement proches soient proches sur le diagramme et réciproquement". For connections between TCov and TCA maps, see Choulakian (2021). For TLRA, the 'nearness' of points can be seen also in equation (20) approximating the logodds ratios. Often prior knowledge is also used for interpretation, see Example 1 in section 7.

The three principles
In this section we review and discuss the basic three principles, on which CA and LRA (RC association and CoDA) are erected; but the discussion is within the framework of TSVD if needed. We observe that the aim of each method was different; and only Goodman (1996) attempted to reconcile both CA and LRA methods, because that was his principal aim.

Yule's principle of scale invariance
We start by quoting (Goodman 1996, section 10) to really understand Yule's principle of scale invariance: "Pearson's approach to the analysis of cross-classified data was based primarily on the bivariate normal. He assumed that the row and column classifications arise from underlying continuous random variables having a bivariate normal distribution, so that the sample contingency table comes from a discretized bivariate normal; and he then was concerned with the estimation of the correlation coefficient for the underlying bivariate normal. On the other hand, Yule felt that, for many kinds of contingency tables, it was not desirable in scientific work to introduce assumptions about an underlying bivariate normal in the analysis of these tables; and for such tables, he used, to a great extent, coefficients based on the odds-ratios (for example, Yule's Q and Y), coefficients that did not require any assumptions about underlying distributions. The Pearson approach and the Yule approach appear to be wholly different, but a kind of reconciliation of the two perspectives was obtained in Goodman (1981a)".
An elementary exposition of these ideas with examples can also be found in Mosteller (1968). Goodman (1996)'s reconciliation is done in two approaches: First, Goodman (1996) proposed "marginal-free correspondence analysis", which is discussed in Choulakian (2022).
Second, it is based on defining the a priori weights in the association index (11), where, by its decomposition into bilinear terms, mwLRA will correspond to Pearson's approach, while uwLRA to Yule's approach. Because log-odds where the principal factor scores satisfy marginally or uniformly weighted relations, see Remark 2 (1c, 1d).
We note that, in a similar spirit see Remark 4, Kazmierczak (1985Kazmierczak ( , 1987 reconciled Yule's scale invariance principle with Benzécri's principle of distributional equivalence, but in a mathematical framework in search of Euclidean metrics that satisfy both principles. To have a clear picture of LRA with general a priori prescribed weights (w C j , w R i ), we first study the properties of the association index λ ij , that distinguishes it from interaction indices (9,10).

Scale invariance of an interaction index
We are concerned with the property of scale dependence or independence of the three interaction indices (9,10,11). We note that in (9,10,11), p ij depends on n ij , p ij = n ij / i,j n ij . To emphasize this dependence, we express an interaction index by τ ij (n ij ) = τ (p ij , m R i , m C j ) where: in the case of the association index τ ij (n ij ) = λ ij is defined in (11), in the case of the nonhomogeneity index τ ij (n ij ) = ∆ ij is defined in (10), and in the case of the nonindependence index τ ij (n ij ) = σ ij is defined in (9). Following Yule (1912), we state the following It is important to note that Yule's principle of scale invariance concerns a function of four interaction terms, see equation (20); while in Definition 1 the invariance concerns either each interaction term in a contingency table discussed by Goodman or the relative values of a compositional vector discussed by Aitchison (see subsection 5.3). A similar interpretation is expressed in Kazmierczak (1985).
It is evident that neither the interaction indices (9,10) nor mwLRA are scale invariant: because they are marginal-dependent.
Concerning the association index (11) we have Lemma 2. To a first-order approximation, Proof. The average value of the density function are distributed around 1. By Taylor series expansion of log x in the neighborhood of x = 1, we have to a first-order log x ≈ x − 1. Putting a i = 1/w R i and b j = 1/w C j in (21), and by using log( which is the required result.
Remark 3. Lemma 2 provides a first order approximation to mwTLRA and uwTLRA, where we see that both first-order approximations are marginal-dependent but in different ways.
which implies that CA (or TCA) is a first-order approximation of mwLRA (or mwTLRA), a result stated in Cuadras, Cuadras, and Greenacre (2006). Note that λ(p ij , p +j , p i+ ) = λ( p ij p +j p i+ , p +j , p i+ ) is stated in Goodman (1996). Furthermore mwLRA (or mwTLRA) is interpreted as prespecified TLRA by defining n * ij = a i n ij b j in Lemma 1 with prespecified weights (w C j , w R i ) = (p +j , p i+ ), where p +j = i n ij /n, p i+ = j n ij /n and n = i,j n ij . b) In the case (a i , b j ) = (I, J) and (w C j , w R i ) = (1/I, 1/J) in Lemma 2, λ ij = λ(p ij , 1/J, 1/I) ≈ IJp ij − Ip i+ − Jp +j + 1; which implies that the bilinear expansion of the right side by TSVD (or SVD) is a first-order approximation of uwTLRA (or uwLRA); a familiar one known as a FANOVA ( factor analysis and analysis of variance), see Mandel (1971). (Benzécri 1966, p. 56) presented his project of uncovering grammatical rules or patterns from texts by announcing his conceptual formulation of the principle of distributional equivalence as : "que deséléments distributionnellement proches soient proches sur le diagramme et réciproquement"; that is, "points that are near in distribution should appear near on the maps and reciprocally". And the notion of nearness of two distributions was introduced by the chi-squared distance (a weighted euclidean distance) in (Benzécri 1973a, p. 150-152). In the statistical literature, the concept of the principle of distributional equivalence is discussed in terms of distances, see for example Escofier (1978), Kazmierczak (1985Kazmierczak ( , 1987, Fichet (2009) and Greenacre and Lewi (2009); except in Nishisato (1984) who named it principle of equivalent partitioning, and Choulakian (2006) who formulated it as invariance of TSVD results, see Theorem 1 below.

Benzécri's principle of distributional equivalence
In the sequel, we separate Benzécri's idea of distributional equivalence property from its principle, by observing that (Aitchison 1994, section 1)'s compositional equivalence is identical to Benzécri (1966)'s distributional equivalence.
Definition 2. Two non negative vectors x and y of the same size are distributionally/compositionally equivalent if x =Cy, where C is a positive constant.
Definition 3. A method of analysis satisfies distributional/compositional equivalence property, if it is applied to a table of nonnegative values where two rows ( or two columns) are proportional, then the method will produce maps where the two proportional rows (or the two proportional columns) coincide.

Remark 4.
a) Definition 3 is implicitly stated by Benzécri (1966, page 56) as: que deś eléments distributionnellement proches soient proches sur le diagramme et réciproquement. b) Definition 3 is a nonmetric variant of Kazmierczak (1985Kazmierczak ( , 1987's definition of "generalized distributional equivalence property", which states: Let x and y be two proportional rows (or columns) in a nonnegative table of size I by J; if we replace x and y by two proportional vectors x 1 and y 1 such that x + y = x 1 + y 1 , then the distances between the columns (or rows) are not changed. Proposition 1 is discussed by Kazmierczak (1985Kazmierczak ( , 1987 within the framework of Euclidean and Riemannian geometries. The proof within Taxicab framework is very simple. Proposition 1. TCA, mwTLRA and TLRA satisfy the distributional/compositional equivalence property. That is, if the 1st row of a non negative table is proportional to the 2nd row, and λ ij = k α=1 f α (i)g α (j)/δ α , then f α (1) = f α (2). The proof is found in Appendix 2.
Theorem 1. Let X = (x ij ) for i = 1, 2, ..., I and j = 1, ..., J be a contingency table or a compositional data set. Suppose the first two rows of X are proportional, x 1j = Cx 2j for j = 1, ..., J, where C is a strictly positive constant. TLRA of X with a priori weights (w R i , w C j ) is equivalent to TLRA of X merge = (x m ij ) for i = (1 + 2), 3, ..., I and j = 1, ..., J with weights Then The proof is found in the Appendix.
Corollary 2. TLRA with prespecified weights, and in particular uwTLRA, does not satisfy Benzécri's principle of distributional equivalence.

Aitchison's principle of subcompositional coherence
In a paper entitled Principles of compositional data analysis, Aitchison (1994) summarized his main ideas.
The first is the compositional /distributional equivalence of two proportional compositional vectors that we stated in Definition 2; which is quantified as a point in unit simplex, see subsection 2.1.
The second is scale invariance of units for each compositional vector of J parts as we provided an example in section 2.
The third, which seems to be the most important, is the principle of subcompositional coherence. We think the scale invariance of the odds ratios in Yule's principle, is intimately related to the principle of subcompositional coherence in the following way. Let N be a count data set which can be interpreted both as a contingency table (the row variable has I levels and the column variable has J columns) and a compositional data set (I individuals, also named samples, of J compositional parts); see for instance the rodent data set in Example 2. As a contingency table, we consider the scale invariant odds ratios of the rows (i, i 1 ) and the columns (j, j 1 ) which compares the ratio of two odds: the odd of (j, j 1 ) given level i and the odd of (j, j 1 ) given level i 1 . Now suppose that the data set is compositional, then the odds ratio (22) will compare the relative variation of the odds (j, j 1 ) of two individuals (i, i 1 ); for an individual i the scale independent representation of a compositional vector will be the odds p ij /p ij 1 . So the set of distinct but redundant odds of size J(J −1)/2, will be a scale invariant representation of the compositional vector. That is the relative values of the components are the basic building blocks in a composition; in addition these relative values of components are unchanged for a subset of parts. This evident and important observation by Aitchison is named principle of subcompositional coherence; in fact it can be considered a corollary to Yule's odds ratio.
Another consequence in considering the ratio p ij /p ij 1 is, by fixing , without loss of generality, j 1 = J, one gets (J − 1) distinct but basic odds p ij /p iJ of size (J − 1), named (alr) transformation. The clr transformation is p ij / exp(G i+ ) of size J, where exp(G i+ ) is the geometric mean defined in subsection 4.1.
Another important contribution of CoDA is Aitchison's geometry, which is the use of Euclidean geometry restricted in the simplex. This has the practical-useful consequence that a lot of classical multivariate statistical theory can easily be applied in CoDA. But other geometries in the simplex have been studied, see for instance Nielsen and Sun (2017), which to our knowledge have not been applied in CoDA; TSVD falls in this category.
We emphasize the fact that only the principal components analysis of the covariances of the clr tansformed compositional data is mathematically identical to Goodman's RC association model.

Isometric logratio transformation
Another active topic in CoDA is isometric logratio (ilr) representation, introduced by Egozcue and Pawlowsky-Glahn (2005), and further developed by Fiserova and Hron (2011) and Hron, Filzmoser, and De Caritat (2017); it is developed within the Euclidean geometry. Isometry is the study of group transformations that keeps distances invariant between any two points in a metric space. Given that each metric space can be characterized by its algebraic group isometries, isometries in the Euclidean space are characterized by the construction of an orthonormal basis, which can be topic based or data based. In CoDA, ilr transformation is implicitly considered topic based, similar to the use of discrete wavelet transform used in image compression, see Strang (2019). Pawlowsky-Glahn (2005, 2019) presented topic based "easily interpretable" ilr transformation named sequential binary partition (SBP) of the J parts (almost identical to the discrete Haar wavelet transform); the topic based SBP can be replaced by or compared with a data based SBP by TSVD.
A particular kind of ilr is the pivot logratios (plr), where the orthonormal basis has a triangular shape apart from the first row. For example for J = 5 parts, the orthonormal basis is This has very simple interpretation displaying the importance of the log parts in the following sense. We designate the 5 columns by the 5 ordered log parts x (i) : x (1) x (2) x (3) x (4) x (5) according to their importance. Then, the second row shows that the first log part, x (1) , being the most important opposes to the rest and it is eliminated; the third row shows that the second most important log part x (2) opposes to x (3) , x (4) , x (5) and it is eliminated; and so on. The underlying idea is that each row starting with the second is made up by a heavyweight log-part. In CA heavyweights are discussed by Benzécri (1979) and Lebart (1979), in TCA by Choulakian (2008b).

Quantifying the intrinsic quality of a taxicab principal axis
We briefly review the quality of measures of a principal dimension in the Euclidean framework, then within the Taxicab framework. We think this comparison will be insightful, see Choulakian (2021).

Euclidean framework
Within the Euclidean framework a common used measure of the quality of a principal dimension α of the residual matrix X 1 described in (13), is the proportion of variance explained (or inertia in the case of CA) Another variant is Note that τ 1 (α) and τ 2 (α) are extrinsic measures of quality of the residuals in the residual matrix X α , because they compare the intrinsic dispersion of a principal axis σ 2 α to the total dispersion k α=1 σ 2 α or to the partial residual dispersion k β=α σ 2 β . Furthermore we have the following evident result lemma 3 that should be compared with Lemma 4. In Lemma 3a (upper bound of 1 is not attained) while in Lemma 4a (upper bound of 1 is attained).

Taxicab framework
The Taxicab variant of τ 2 is particularly adapted in TSVD , which we will interpret as a new intrinsic measure of quality of the signs of the residuals in the residual matrix X α for α = 1, ..., k.
Let S ∪ S = I 1 be the optimal principal axis partition of I 1 , and similarly T ∪ T = J 1 be the optimal principal axis partition of J 1 , such that S = {i : a α (i) > 0} = {i : v α (i) > 0} and (4). Thus the data set is divided into 4 quadrants. Based on the equations (18), we define a new index quantifying the quality of the signs of the residuals in each quadrant of the αth residual matrix X α for α = 1, ..., k.
Definition 5. For α = 1, ..., k, the measure of the quality of the signs of the residuals in the for E = S and S, and, F = T and T . Sometimes we express it also in %.
We have the following easily proved b) For α = k, QSR α = 1.
The interpretation of QSR α (E, F ) = ±1 is that in the quadrant E × F the residuals have one sign; and this is a signal for very influential cells or columns or rows; for an example see Choulakian (2021). So Lemma 4 provides a necessary and sufficient condition for QSR α = 1, which is not true for τ 1 (α) and τ 2 (α). Geometry plays its unique role.
Remark 6. The computation of the elements of QSR α (+) and QSR α (−) are done easily in the following way. We note that the α-th principal axis can be written as where v α+ = (v α + 1 I )/2 and v α− = (v α − 1 I )/2, where 1 I designates a column vector of 1's of size I. So where abs(X α ) = (|X α (i, j)|).

Zero counts
In CoDA and RC association models, data with zero values are changed into positive values, so that the log transformation can be applied. Lubbe, Filzmoser, and Templ (2021) compare different strategies proposed in the literature. Researchers often distinguish compositional data that are counts (discrete) from continuous; an interesting discussion on counts compositional data, named lattice compositions, is discussed in Lovell, Chua, and McGrath (2020).
Here we present a flexible method that includes three different but related Bayesian estimators of relative frequencies in the presence of zero counts in contingency tables.
First, for a contingency table with zero counts (n ij ), Egozcue et al. (2015) consider the modified table (n ij + 1/(IJ)) named Perks point estimator of the cell counts. This is a particular instance of a Bayesian point estimator of the relative frequency of a cell with Dirichlet prior having the general form of (n ij + ω j )/(n + t), see (Bernard 2005, equation 4).
Second, another solution is to consider the modified table (n ij + 1) based on arguments in (Benzécri 1973b, p. 218) and Emerson and Stoto (1983), that we describe. We consider the power family of transformations T α (n ij ) = n α ij for α > 0 = log b (n ij + c) for α = 0.
Find the values of the constants b and c such that the family of curves T α (n ij ) passes through two common points T α (n ij = 0) = 0 and T α (n ij = 1) = 1 for α ≥ 0. The solution is b = 2 and c = 1, and we call it the BES approach; which is equivalent to the Bayes-Laplace estimator, see (Bernard 2005, subsection 3.2).
A third estimator mentioned by (Bernard 2005, subsection 3.2) is Jeffreys estimator when c = 1/2. How can we decide the best, if it exists, among these three estimators? A flexible approach that includes the three estimators is to consider the transformation T (c) = log(n ij + c) for c = 1, 1/2, 1/(IJ), and find the value of c that maximizes QSR 1 or QSR 1 + QSR 2 measures.
We note that for lattice compositions, it is sufficient to replace IJ by J above.
Remark 7. Another interesting observation is: Suppose Z = (z ij ) is an incidence matrix, a presence-absence data set, where z ij = 0 means level j is absent in the i-th individual, z ij = 1 means level j is present in the i-th individual. CA (or TCA) is a popular method for the analysis of such tables, see for an example Choulakian and Abou-Samra (2020). Putting p ij = z ij / i,j z ij and supposing that the marginals p +j > 0 and p i+ > 0, CA (or TCA) data reconstruction formula is Now suppose we apply uwLRA (or uwTLRA); observing log 2 (z ij + 1) = z ij , then the data reconstruction formula is which is the first-order approximation of uwTLRA (or uwLRA) found in Remark 3b: a familiar one known as a FANOVA ( factor analysis and analysis of variance), see Mandel (1971).
Remark 8. Here, we summarize how to analyze a contingency table or a lattice distribution by CA, mwTLRA and uwTLRA: we can have 4 cases.
Case 1a) There are not zero counts and there are not 2 proportional rows or columns: Then we can analyze N by CA, mwTLRA or uwTLRA.
Case 1b) There are not zero counts but there are at least 2 proportional rows or columns: Then we can analyze N or N merged by CA or mwTLRA; and only N by uwTLRA.
Case 2a) There are zero counts and there are not 2 proportional rows or columns: Then we can analyze N by CA, (N + c1 I 1 J ) by mwTLRA or uwTLRA.
Case 2b) There are zero counts but there are at least 2 proportional rows or columns: Then we can analyze N or N merged by CA, (N merged +c1 I 1 J ) by mwTLRA; and (N modified + c1 I 1 J ) by uwTLRA: where N modified = (n * ij ) is computed in two steps to satisfy Lemma 1 and Proposition 2. In Step 1, n 1 ij = n ij / gcd i , where gcd i is the greatest common divisor of the strictly positive counts of the i-th row (n ij > 0 for j = 1, ..., J and i fixed); in Step 2, n * ij = n 1 ij / gcd(n 1 ij > 0 for i = 1, ..., I and j fixed).
Step 1 is needed in case there are at least 2 proportional rows; Step 2 is needed in case there are at least 2 proportional columns. Rodent data set in Example 2 is such an example where there are groups of 7 proportional rows, so Step 2 is not needed, so n * ij = n 1 ij .

Examples
Here we consider the analysis of two tables.

Food compositional data
The food compositional data set, displayed in Appendix 3, is of size 25 by 9 and analyzed quite in Smail detail by uwLRA in Pawlowsky-Glahn and Egozcue (2011). These data are percentages of consumption of 9 different kinds of food in 25 countries in Europe in the early eighties. The 9 different kinds of food are: red meat (RM); white meat (WM); fish (F); eggs (E); milk (M); cereals (C); starch (S); nuts (N); fruit and vegetables (FV). The 25 countries are divided into 16 western (w) and 9 eastern (e) countries. It is evident that in Table 1, TCA QSR 1 = 77.89% and mwTLRA QSR 1 = 80.1% values are significantly higher than the uwTLRA QSR 1 = 68.69%, so we choose either TCA or mwTLRA. On the maps displayed in Figures 1, 2 and 3, the 9 food kinds are represented by their symbols and the 25 countries by their symbols eastern (e) or western (w). Applying the R code in Appendices 2 and 3, one sees that mwTLRA and TCA maps are very similar and they discriminate much better the eastern and the western countries than the uwTLRA map: All eastern countries are clustered in the third quadrant, except one located in the first quadrant. We also note that uwTLRA map is very similar to LRA map in Pawlowsky-Glahn and Egozcue (2011).
In Table 1, we also presented the first three taxicab dispersion measures, δ α for α = 1, 2, 3. δ 1 is of the same order for the three methods; however QSR 1 values are different, which shows that QSR α are not related to δ α in general.

Rodent abundance data
We consider the rodent data set of size 28 by 9 found in TaxicabCA in R package. This is an abundance data set of 9 kinds of rats in 28 cities in California. Choulakian (2017) analyzed it by comparing the CA and TCA maps; Choulakian (2021) showed that it has quasi-2-blocks Let N be the original data set of size 28 by 9; the following function in R calculates the proportion of zero counts in N • sum(N == 0)/length(N) [1] 0.6626984 The function "CombineCollinearRowsCols" in the package TaxicabCA in R merges the rows and the columns of N, which are proportional; we see that the size of the Nmerged is 21 by 9.
• Nmerged ← CombineCollinearRowsCols(N, rows = T, cols = T) • dim(Nmerged) [1] 21 9 Here, we are in the Case 2b of Remark 8: We can analyze N or N merged by CA, (N merged +c1 I 1 J ) by mwTLRA; and (N modified +c1 I 1 J ) by uwTLRA. N modified can be calculated using the package "numbers" by Borchers (2021) Table 2 displays the numerical values of QSR α and δ α for α = 1, 2 for TCA, uwTLRA and mwTLRA methods. For the last two methods, the optimal choice of the constant is c = 1. We summarize the numerical results in Table 1: First, it makes sense, that the dispersion values of the uwTLRA and mwTLRA methods using BES Bayes-Laplace estimator, c = 1, are the smallest among the three methods BES, Jeffreys and Perks.
Second, for the choice among the three methods TCA, uwTLRA or mwTLRA, QSR 1 is not informative, while QSR 2 is informative: The choice clearly is TCA or mwTLRA.
In conclusion, we prefer TCA, because it is unaffected by the presence of zeros. This example shows that, the addition of a constant c > 0 to a count data with zero cells complicates the application of logratio methods. We completely agree with Egozcue, Pawlowsky-Glahn, and Gloor (2018)

Conclusion
In this paper, we, as dwarfs on the shoulders of three giants Benzécri-Goodman-Aitchison, attempted to see further by relating theory with practice. First, we reviewed the principles on which three interrelated well developed methods, for the analysis and visualization of contingency tables and compositional data, are erected: CA based on Benzécri's principle of distributional equivalence, Goodman's RC association model based on Yule's principle of scale invariance, and CoDA based on Aitchison's principle of subcompositional coherence.
Second, we introduced a novel index named the intrinsic measure of the quality of the signs of the residuals (QSR) for the choice of the preprocessing (double-centering), and consequently of the method among TCA, mwTLRA and uwTLRA. The criterion is based on taxicab singular value decomposition (TSVD) on which the package TaxicabCA in R is developed. We presented a minimal R script that can be executed to obtain the numerical results and the maps in this paper.
Third, we introduced a flexible method based on the QSR index for the choice of the constant to be added to contingency tables with zero counts so that TLRA methods can be applied.
We conclude by re-citing (Tukey 1977, p. 400): "the general maxim-it is a rare thing that a specific body of data tells us clearly enough how it itself should be analyzed-applies to choice of re-expression for two-way analysis". In this paper we studied three choices of re-expression for contingency tables and compositional data: TCA, mwTLRA and uwTLRA. Do all roads lead to Rome? In unipolar world, yes; but in multipolar world, no. In the analysis of contingency tables and compositional data, the world is multipolar.
Theorem 2. Let X = (x ij ) for i = 1, 2, ..., I and j = 1, ..., J be a contingency table or a compositional data set. Suppose the first two rows of X are proportional, x 1j = Cx 2j for j = 1, ..., J, where C is a strictly positive constant. TLRA of X with a priori weights (w R i , w C j ) is equivalent to TLRA of X merge = (x m ij ) for i = (1 + 2), 3, ..., I and j = 1, ..., J with weights (w Rm i , w C j ), where x m (1+2)j = x 1j + w 2j and w Rm 1+2 = w R 1 + w R 2 , and x m ij = x ij and w Rm i = w R i for i = 3, ..., I.
Proof. We sketch the steps for a proof: a) We show λ merge (1+2)j = λ 2j and we use 1 2.
Let us consider separately the four terms, T i for i = 1, 2, 3, 4, on the right-side of equation (A1).
w R i log(p ij ) + (w R 1 + w R 2 ) log(C + 1) + w R 1 log(C). The proof is similar to a) above. c) We use the three steps of section 4.2, where Step 1 is double-centering.