Decomposition of Symmetry into Ordinal Quasi-Symmetry and Marginal Equimoment for Multi-way Tables

For the analysis of square contingency tables with ordered categories, Agresti (1983) introduced the linear diagonals-parameter symmetry (LDPS) model. Tomizawa (1991) considered an extended LDPS (ELDPS) model, which has one more parameter than the LDPS model. These models are special cases of Caussinus (1965) quasi-symmetry (QS) model. Caussinus showed that the symmetry (S) model is equivalent to the QS model and the marginal homogeneity (MH) model holding simultaneously. For square tables with ordered categories, Agresti (2002, p.430) gave a decomposition for the S model into the ordinal quasi-symmetry and MH models. This paper proposes some decompositions which are different from Caussinus’ and Agresti’s decompositions. It gives (i) two kinds of decomposition theorems of the S model for two-way tables, (ii) extended models corresponding to the LDPS and ELDPS, and the generalized model further for multi-way tables, and (iii) three kinds of decomposition theorems of the S model into their models and marginal equimoment models for multi-way tables. The proposed decompositions may be useful if it is reasonable to assume the underlying multivariate normal distribution. Zusammenfassung: Zur Analyse quadratischer Kontingenztafeln mit geordneten Kategorien führte Agresti (1983) das lineare Diagonal-Parameter Symmetrie (LDPS) Modell ein. Tomizawa (1991) betrachtete ein erweitertes LDPS (ELDPS) Modell, das um einen Parameter mehr hat als das LDPS Modell. Diese Modelle sind Spezialfälle des Quasi-Symmetrie (QS) Modells von Caussinus (1965). Caussinus zeigte, dass das Symmetrie (S) Modell äquivalent dem QS Modell ist und dass das marginale Homogenitäts(MH) Modell dann auch hält. Für quadratische Tafeln mit geordneten Kategorien gab Agresti (2002, p.430) eine Zerlegung des S Modells in das ordinale Quasi-Symmetrie und das MH Modell an. Wir schlagen Zerlegungen vor, die sich von jenen in Caussinus und Agresti unterscheiden. Wir liefern (i) zwei Arten Zerlegungssätze des S Modells für zwei-weg Tafeln, (ii) erweiterte Modelle entsprechend dem LDPS und ELDPS, das generalisierte Modell für mehr-weg Tafeln, and (iii) drei Arten Zerlegungssätze des S Modells in deren Modelle und marginal Equimoment Modelle für mehr-weg Tafeln. Die vorgeschlagenen Zerlegungen könnten nützlich sein, falls die Annahme einer zugrunde liegenden multivariaten Normalverteilung begründet ist.


Introduction
Suppose that an R × R square contingency table has the same categories in the row classification as in the column classification.Let X 1 and X 2 denote the row and column variables, respectively, and let p ij denote the probability that an observation will fall in the ith row and jth column of the table (i, j = 1, . . ., R).Thus, Pr(X 1 = i, X 2 = j) = p ij .The symmetry (S) model is defined as where ψ ij = ψ ji (Bowker, 1948;Bishop, Fienberg, and Holland, 1975, p.282).This indicates that the probability that an observation will fall in the (i, j) cell, i = j, is equal to the probability that the observation falls in the symmetric (j, i) cell.Caussinus (1965) considered the quasi-symmetry (QS) model, defined by where ψ ij = ψ ji .A special case of this model with {α i = β i } is the S model.Denote the odds ratio for rows i and j (> i) and columns s and t (> s) by θ (i<j;s<t) .Thus θ (i<j;s<t) = (p is p jt )/(p js p it ).Using the odds ratios, the QS model is further expressed as Therefore, the QS model has characterization in terms of symmetry of odds ratio.For the QS model, also see, e.g., Bishop et al. (1975, p.286), Goodman (1979a), Darroch and McCloud (1986), and Agresti (2002, p.425).
The marginal homogeneity (MH) model is defined by where (Stuart, 1955, Bishop et al., 1975, p.293).This indicates that the row marginal distribution is identical with the column marginal distribution.
For square tables with ordered categories, Agresti (1984, p.203) proposed the linear diagonals-parameter symmetry (LDPS) model defined by where ψ ij = ψ ji .A special case of this model obtained by putting δ = 1 is the S model.Note that the LDPS model is a special case of the diagonals-parameter symmetry model of Goodman (1979b).The LDPS model may be also expressed as where φ ij = φ ji .Moreover, it may be expressed as This indicates that the probability that an observation will fall in the (i, j) cell, i < j, is δ j−i times higher than the probability that the observation falls in the (j, i) cell.Moreover, Agresti (2002, p.429) considered the ordinal quasi-symmetry (OQS) model defined by where ψ ij = ψ ji and u 1 ≤ • • • ≤ u R denote the ordered scores which assigned for both the rows and columns.Note that the OQS model with integer scores {u i = i} is identical to the LDPS model.Tomizawa (1991) considered a model defined by where  Agresti (1983) described the relationship between the LDPS model and the joint bivariate normal distribution as follows.When σ 2 1 = σ 2 2 , the f (u, v)/f (v, u) has the form ξ v−u for some constant ξ, and hence the LDPS model may be appropriate for a square ordinal table if it is reasonable to assume an underlying bivariate normal distribution with equal marginal variances.Tomizawa (1991) described that the ELDPS model rather than the LDPS model would be appropriate if it is reasonable to assume an underlying bivariate normal distribution which does not require the equality of marginal variances.Caussinus (1965) gave the theorem that the S model holds if and only if both the QS and MH models hold for square contingency tables.Bishop et al. (1975, p.287) and Bhapkar and Darroch (1990) gave the decompositions for the S model for three-way tables and for multi-way tables, respectively.Agresti (2002, p.429) showed that the S model holds if and only if both the OQS and MH models hold.Note that the LDPS (OQS) and ELDPS models are special cases of the QS model.Since the OQS model has restrictions stronger than the QS model, we are interested in decomposing the S model into a model with weaker restrictions instead of the MH model.
In this paper we propose the other decompositions for the S model and give some extended models for the multi-way tables.Section 2 proposes two kinds of decomposition theorems of the S model for two-way tables.Sections 3 and 4 propose the extended models corresponding to the LDPS and ELDPS models, and the generalized model further for multi-way tables, and give some decomposition theorems of the S model.Ordinal Quasi-Symmetry and Marginal Equimoment Define the monotonic function as where the function is specified.Consider the marginal mean equality (ME) model defined by where µ t = E(g(X t )).This indicates that the mean of g(X 1 ) is equal to the mean of g(X 2 ).We shall consider the decompositions for the S model as follows: Then the LDPS and ME models are expressed as and where where 1), ( 2) and (3), we see and Note that K(•, •) is the Kullback-Leibler information.From (4) we obtain where Since π is fixed, we see and then {p * ij } uniquely minimizes K(p, π) (see Darroch and Ratcfiff (1972); Darroch and Speed (1983); Bhapkar and Darroch (1990). Let where and then {p * * ij } uniquely minimizes K(p, π).Therefore, we see Namely the S model holds.The proof is completed.
Next, consider the marginal variance equality (VE) model defined by where σ 2 t = var(g(X t )).This indicates that the variance of g(X 1 ) is equal to the variance of g(X 2 ).We shall consider the other decomposition for the S model as follows.
Theorem 2.2 The S model holds if and only if all the ELDPS, ME and VE models hold.
The proof is omitted because it is obtained in a similar way to the proof of Theorem 2.1.Theorems 2.1 and 2.2 may be useful for seeing the reason for the poor fit when the S model fits the data poorly.

Extension to Three-way Tables
We shall extend the LDPS and ELDPS models to three-way tables and consider a generalized model.Furthermore we shall give the some decomposition theorems of the S model for three-way tables.

Models
For an R × R × R contingency table, let X 1 , X 2 , and X 3 denote the first, second, and third variable, respectively, and let p ijk denote the probability that an observation will fall in the (i, j, k) cell of the table for 1 ≤ i, j, k ≤ R. The symmetry model is defined by where (Bishop et al., 1975, p.301).We shall denote this model by S-3.First, consider a model defined by where Without loss of generality we may set, e.g., α 3 = 1.This model may be also expressed as where (l, m, n) is any permutation of (i, j, k).It is easily seen that this model is an extension of the LDPS model to three-way tables.We shall denote this model by LDPS-3.For example, when X 3 is constant, p ijk /p jik = (α 2 /α 1 ) j−i , namely, the more the difference between X 1 and X 2 is large, the more the LDPS-3 model shifts from symmetry greatly exponentially.Consider now three variables U , V and W having a joint normal distribution with means for some constants ξ 1 , ξ 2 , and ξ 3 .Hence if it is reasonable to assume this underlying three-variate normal distribution, the LDPS-3 model may be appropriate for an ordinal three-way table (see Section 7).
Secondly, consider a model defined by where Without loss of generality we may set, e.g., α 3 = β 3 = 1.It is easily seen that this model is an extension of the ELDPS model to three-way tables because for two-way tables this model indicates that the p ij /p ji has the form δ j−i γ j 2 −i 2 for some constants δ and γ.We shall denote this model by ELDPS-3.If it is reasonable to assume an underlying three-variate normal distribution which does not require the equality of marginal variances, then the ELDPS-3 model rather than the LDPS-3 model may be appropriate for an ordinal three-way table (see Section 7).
Finally, consider a model defined by where Without loss of generality we may set, e.g., α 3 = β 3 = γ 23 = 1.We shall denote this model by GLDPS-3.A special case of this model obtained by putting γ 12 = γ 13 = γ 23 = 1 is the ELDPS-3 model; namely, this is an extension of the ELDPS-3 model.If it is reasonable to assume an underlying more general three-variate normal distribution which does not require the equality of marginal variances and the equality of correlations, then the GLDPS-3 model rather than the ELDPS-3 model may be appropriate for an ordinal three-way table (see Section 7).

Decompositions for the Symmetry Model
Using the monotonic function as where this function is specified, first, consider the marginal mean equality (ME-3) model defined by where µ t = E(g(X t )).
Secondly, consider the marginal variance equality (VE-3) model defined by where σ 2 t = var(g(X t )).Finally, consider the correlation equality (CE-3) model defined by where ρ st is the correlation between g(X s ) and g(X t ).We obtain the following theorems.The proofs of these theorems are omitted because these are obtained in a similar way to the proof of Theorem 2.1.

Extension to Multi-Way Tables
We extend the models and decompositions in Section 3 to multi-way tables.For an R T contingency table, let p i 1 ...i T denote the probability that an observation falls in the (i 1 , . . ., i T ) cell of the table (i t = 1, . . ., R; t = 1, . . ., T ).
We shall denote this model by S-T .In particular, when T = 3, the S-T model is defined as Secondly, we now consider a model defined by where ψ i 1 ...i T = ψ j 1 ...j T with (j 1 , . . ., j T ) ∈ D(i 1 , . . ., i T ).Note that we may set, e.g., α T = 1.We shall denote this model by LDPS-T .
Thirdly, we consider a model defined by where ψ i 1 ...i T = ψ j 1 ...j T with (j 1 , . . ., j T ) ∈ D(i 1 , . . ., i T ).Note that we may set, e.g., α T = β T = 1.We shall denote this model by ELDPS-T .Lastly, we consider a model defined by where ψ i 1 ...i T = ψ j 1 ...j T , with (j 1 , . . ., j T ) ∈ D(i 1 , . . ., i T ).Note that we may set, e.g., α T = β T = γ T −1,T = 1.We shall denote this model by GLDPS-T .When T = 2, this model is identical to the ELDPS model.Thus, the GLDPS-T model is defined when T ≥ 3. Note that Bishop et al. (1975, p.303) defined the QS model for three-way tables, and Bhapkar and Darroch (1990) defined the hth-order (1 ≤ h < T ) QS model for multi-way R T tables (also see Agresti, 2002, p.440, for the first order QS model).We note that the LDPS-T and ELDPS-T models are special cases of the first order QS model and the GLDPS-T model is a special case of the second order QS model.
Denote the ME, VE and CE models for R T tables by ME-T , VE-T and CE-T , respectively.Then we obtain the following decomposition theorems of the S-T model for R T tables.

Theorem 4.1 The S-T model holds if and only if both the LDPS-T and ME-T models hold.
Theorem 4.2 The S-T model holds if and only if all the ELDPS-T, ME-T and VE-T models hold.

Theorem 4.3 The S-T model holds if and only if all the GLDPS-T, ME-T, VE-T and CE-T models hold.
The proofs of these theorems are omitted because they are obtained in similar ways to the proof of Theorem 2.1.
Table 1: Numbers of degrees of freedom (df) for models applied to the R T table (T ≥ 2), where the GLDPS-T model is defined when T ≥ 3.
Assume that a multinomial distribution applies to the R T table.The maximum likelihood estimates of expected frequencies under each model could be obtained using the Newton-Raphson method to the log-likelihood equations or using the iterative procedures, for example, the general iterative procedure for log-linear models of Darroch and Ratcfiff (1972).Each model can be tested for goodness-of-fit by, e.g., the likelihood ratio chi-square statistic (denoted by G 2 ) with the corresponding degrees of freedom (df).Note that e.g., for square tables, G 2 is where n ij is the observed frequency in the (i, j)th cell, and m ij is the maximum likelihood estimate of expected frequency m ij under the given model.The numbers of df for models are given in Table 1.Note that the number of df for the S-T model is equal to the sum of those for the decomposed models.

Example 1
Table 2 taken directly from Agresti (1984, p.206) is the father's and son's occupational mobility data in Britain.These data have been analyzed by some statisticians including Bishop et al. (1975, p.100), Goodman (1981Goodman ( , 1984)), Agresti (1984, pp.205-206), and Tomizawa (1990a, 1990b, 1990c, 1991).Table 3 gives the values of the likelihood ratio statistic G 2 for models applied to these data.The S model fits the data in Table 2 very poorly since the value of G 2 is 37.5 (p < 0.001) with 10 df.The LDPS model does not fit these data so well yielding G 2 = 17.1 (p = 0.047) with 9 df.However the ELDPS model fits these data well yielding G 2 = 11.1 Table 2: Occupational status for British father-son pairs; from Agresti (1984, p.206).The parenthesized values are the maximum likelihood estimates of expected frequencies under the ELDPS model.(p = 0.194) with 8 df.Using Theorems 2.1 and 2.2, we shall consider the reason why the S model fits these data poorly.
The VE model with g(k) = k, k = 1, . . ., 5, fits the data in Table 2 very well, but the ME model with g(k) = k fits these data poorly (see Table 3).Therefore it is seen from Theorem 2.2 that for these data, the poor fit of the S model is caused by the influence of the poor fit of the ME model rather than the ELDPS and VE models because the ELDPS and VE models fit these data well.

Example 2
The data in Table 4 give results of the treatment group only in randomized clinical trials conducted by a pharmaceutical company in anemic patients with cancer receiving chemotherapy.The response is the patient's hemoglobin (Hb) concentration at baseline (before treatment) and following 4 and 8 weeks of treatment.Table 4 shows the 3 × 3 × 3 array of counts of Hb response that is classified as ≥ 10g/dl, 8 − 10g/dl, and < 8g/dl.
The S-3 model fits these data in Table 4 very poorly, yielding G 2 = 76.2 (p < 0.001) with 17 df (Table 5).By using the decompositions for the S-3 model, we shall consider the reason why the S-3 model fits these data poorly.Each of the GLDPS-3 and VE-3 models with g(k) = k fits the data in Table 4 very well, but the LDPS-3, ELDPS-3, ME-3 and CE-3 models fit these data poorly (see Table 5).From Theorem 3.2, the poor fit of the S-3 model is caused by the influence of the poor fits of both the ELDPS-3 and ME-3 models with g(k) = k (rather than the VE-3 model).Also, from Theorem 3.3, the poor fit of the S-3 model is caused by the influence of the poor fits of both the ME-3 and CE-3 models with g(k) = k (rather than the GLDPS-3 and VE-3 models).Ohio, 1940(from Bishop et al., 1975, p.305).

Example 3
The data in Table 6, taken directly from Bishop et al. (1975, p.305), give the 3 × 3 × 3 array of counts of stationary two-step transitions in the panel survey of potential voters in Erie County, Ohio, 1940, which summarize the voting intentions of the 1940 presidential elections.Although the voter's supportive political party was classified into Republican, Democrat, and Undecided, we regard the voters with 'Undecided' as the middle class which could not decide Republican or Democrat, and give an order like Republican, Undecided, and Democrat.
The S-3 model fits these data poorly, yielding G 2 = 229.8with 17 df (Table 7).By using the decompositions for the S-3 model, we shall consider the reason why the S-3 model fits these data poorly.
The ME-3 model does not fit the data in Table 6 very well since the value of G 2 is 6.58 (p < 0.05) with 2 df, but it fits much better than any other models (Table 7).In terms of the various decompositions theorems, we can see that the poor fit of the S-3 model may be caused by the influence of the more poor fits of the other models rather than the ME-3 model.decompositions for the S-T model would be useful for seeing the reason for the poor fit when the S-T model fits the data poorly.Moreover, the decomposition for the S-T model into more (three or four) models rather than into two models would be useful for seeing in more details the reason for the poor fit when the S-T model fits the data poorly.
Because the S model can be decomposed in at least two ways, one may be interested in which decomposition should one apply.For square tables, from Theorems 2.1 and 2.2, the S model is decomposed into (1) the LDPS and ME models and (2) the ELDPS, ME, and VE models.However, the LDPS model is not equivalent to the ELDPS and VE models holding simultaneously.Therefore both decompositions should be applied for analyzing the data.
It may seem to readers that in Examples the decomposed model (e.g., the LDPS and ME models) are tested after the S model is rejected, and the test of the S model can therefore be seen as a preliminary test.However the decomposed models should be applied even if the S model is accepted.Assuming that the LDPS model holds true, the hypothesis that the S model holds, i.e., δ = 1 in the LDPS model, can be tested by the difference between the G 2 values for the S and LDPS models.Even if the S model fits the data well, the structure of complete symmetry may not exist for the data.For the ordinal data, then we are also interested in seeing the structure of asymmetry, e.g., the structure of the LDPS model.The estimate of parameter δ in the LDPS model would be useful for making inferences such as that X 1 is stochastically less than X 2 or vice versa according as the estimated δ is greater than 1 (or less than 1).So, for the ordinal data, the LDPS model would be useful even when the S model fits the data well.The ME and VE models would be useful for seeing the structure of the marginal distributions.
It also may seem that the decision procedure consists of a sequence of likelihood ratio tests, and these might be a simultaneous testing problem.However, when we want to see which model of the decomposed models has the more poor fit (e.g., by p-values), we would not need the adjustment of the individual significance levels.If we want to judge whether or not the S model holds by judging whether or not each of decomposed models holds at the given significance level, we had better adjust the individual significance level.

Theorem 3. 1
The S-3 model holds if and only if both the LDPS-3 and ME-3 models hold.Theorem 3.2 The S-3 model holds if and only if all the ELDPS-3, ME-3 and VE-3 models hold.Theorem 3.3 The S-3 model holds if and only if all the GLDPS-3, ME-3, VE-3 and CE-3 models hold.

Table 3 :
Likelihood ratio chi-square values G 2 for models applied to the data in Table2.

Table 4 :
Hemoglobin concentration at baseline, 4 weeks and 8 weeks in carcinomatous anemia patients from a randomized clinical trial.The parenthesized values are the maximum likelihood estimates of expected frequencies under the GLDPS-3 model.

Table 5 :
Likelihood ratio chi-square values G 2 for models applied to the data in Table4.

Table 6 :
Stationary two-step transitions in a panel study of potential voters in Erie County,

Table 7 :
Likelihood ratio chi-square values G 2 for models applied to the data in Table6.