Decompositions of Marginal Homogeneity Model Using Cumulative Logistic Models for Square Contingency Tables with Ordered Categories

For square contingency tables with ordered categories, Agresti (1984, 2002) considered the marginal cumulative logistic (ML) model, which is an extension of the marginal homogeneity (MH) model. The ML model depends on the probabilities on the main diagonal of the table. This paper (1) proposes the conditional marginal cumulative logistic (CML) model which does not depend on the probabilities on the main diagonal, and (2) decomposes the MH model into the ML (CML) model and the model which indicates the equality of row and column marginal means. Examples are given. Zusammenfassung: Für quadratische Kontingenztafeln mit geordneten Kategorien betrachtete Agresti (1984, 2002) ein marginales kumulatives logistisches Modell (ML), welches eine Erweiterung des marginalen Homogenitätsmodells (MH) ist. Das ML Modell hängt von den Wahrscheinlichkeiten auf der Hauptdiagonalen der Tafel ab. In diesem Aufsatz wird (1) das konditionale marginale kumulative logistische (CML) Modell empfohlen, das nicht von den Wahrscheinlichkeiten auf der Hauptdiagonalen abhängt, und (2) das MH Modell zerlegt in das ML (CML) Modell und ein Modell, welches die Gleichheit von marginalen Zeilenund Spaltenmittel anzeigt. Beispiele werden angeführt.


Introduction
For the R × R square contingency table with ordered categories, let p ij denote the probability that an observation will fall in the cell in row i, i = 1, . . ., R, and column j, j = 1, . . ., R, and let X and Y denote the row and column variables, respectively.First, consider the marginal homogeneity (MH) model defined by where 362 Austrian Journal of Statistics, Vol. 34 (2005), No. 4, 361-373 This model indicates that the row marginal distribution is the same as the column marginal distribution.For this model see also Stuart (1955), Bishop et al. (1975, p. 294), Agresti (1984, p. 207), and Tomizawa (1991, 1993b, 1998).
Let F X i and F Y i denote the marginal cumulative probabilities of X and Y , respectively.These are Then the MH model may also be expressed as Let L X i and L Y i denote the marginal cumulative logit of X and Y , respectively.These are given as and for i = 1, . . ., R − 1.Then the MH model may be further expressed as As an extension of the MH model, Agresti (1984, p. 205), Agresti (2002, p. 420) considered the marginal cumulative logistic (ML) model defined by (2) This model indicates that one marginal distribution is a location shift of the other marginal distribution on a logistic scale.So, this model states that the odds that X is i or below instead of i + 1 or above, is exp(∆) times higher than the odds that Y is i or below instead of i + 1 or above, for i = 1, . . ., R − 1.If ∆ > 0, X rather than Y tends to be i or below instead of i + 1 or above, for i = 1, . . ., R − 1.The ML model may also be expressed as and , A special case of this model obtained by putting ∆ = 0 is the MH model.Therefore, the MH model implies the ML model, but the converse does not hold.Hence, we are now interested in what restrictions need to be imposed on the ML model in order that the MH model holds.
The MH model defined by (1) essentially does not depend on the probabilities {p ii } on the main diagonal of the table.Thus, the MH model may also be expressed as where Then the MH model may be further expressed as . We are also interested in proposing another ML model defined by (2) with {F In addition, we are interested in decomposing the MH model using such another ML model.
Table 3 is taken from Andersen (1980, p. 328) and includes the results of two consecutive opinion polls held in August and October 1971, in connection with the Danish referendum on whether to join the Common Market or not.
For analyzing the data of square tables with ordered categories like Tables 1, 2 and 3, the various models of symmetry or asymmetry, like e.g., symmetry, quasi-symmetry, MH, and conditional symmetry models are applied (see, e.g., Tomizawa, 1993a).These models do not depend on the main diagonal cell probabilities {p ii }; namely, these indicate the structure of symmetry or asymmetry on condition that an observation will fall in one of off-diagonal cells of the table.So, many statisticians may be interested in the structure of symmetry or asymmetry for off-diagonal cell probabilities {p ij }, i = j.We are also interested in another ML model defined by (2) with {F }, which indicates the structure of marginal inhomogeneity on condition that an observation will fall in one of off-diagonal cells of the table.
In addition, the readers may be interested in, for example, (1) for the data in Table 1, seeing how the right eye of a woman is symmetric or asymmetric to her left eye on condition that the her right eye grade is different from her left eye grade, and (2) for the data in Table 3, seeing, when the opinion in poll II of an individual changed from in poll I, how the individual changed the attitude.So, we are again interested in models which indicate the structure of marginal inhomogeneity on condition that an observation will fall in one of off-diagonal cells of the table.
When the MH model does not hold for the data, many statisticians may be interested in applying various models of symmetry or asymmetry instead of the MH model, and in selecting a best-fitting model by using, e.g., the AIC-criterion (see Sakamoto et al., 1986).In addition, we are then interested in the reason why the MH model does not hold.Although it would be impossible to see the reason by the AIC-criterion, the decomposition of the MH model into some models may be useful to see the reason, according to which model of decomposed models fits well and which model of them fits poorly (see Section 3 for details).
Generally, consider a decomposition of model, say, M 1 , such that model M 1 holds if and only if both models M 2 and M 3 hold.When models M 1 and M 2 fit the data poorly and model M 3 fits the data well, we can then understand that the poor fit of model M 1 is caused by the lack of structure of model M 2 rather than the structure of model M 3 .Thus, the decomposition of model M 1 (especially, into two models) would be useful to see the reason for the poor fit of model M 1 .
Denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of model M by G 2 (M ).For testing if model M 1 holds assuming that model M 2 holds, the likelihood ratio statistic is given as Then the decomposition of model M 1 into models M 2 and M 3 also has the advantages that (1) when model M 2 holds, the conditional statistic G 2 (M 1 |M 2 ) is more powerful than the unconditional statistic G 2 (M 1 ) (see, e.g., Agresti, 1984, p. 82) and (2) to decompose the MH model using the two kinds of ML models.The decompositions may be useful for seeing the reason for the poor fit when the MH model fits the data poorly.

} and {F
We shall consider two kinds of decompositions of the MH model.

A decomposition of the MH model using the ML model
Consider a model defined as i.e., E(X) = E(Y ).This indicates that the mean of the row variable X equals the mean of the column variable Y .Note that the MH model implies model ( 4).Consider a specified monotonic function ), where at least one strict inequality holds.Using the function g(k), model ( 4) is generalized as i.e., E(g(X)) = E(g(Y )).We shall refer to (5) as the marginal mean equivalence (ME) model.This indicates that the mean of g(X) is equal to the mean of g(Y ).
The {g(k)} may be considered as the ordered scores {u k } assigned to both row and column categories if it is possible to assign the scores; namely, , then the ME model with g(k) = u k is equivalent to the model (4), i.e, the ME model with g(k) = k.Therefore, model (5) applied on examples in Section 3 may also be considered as the ME model with g(k) = k.(We note that when the scores are not equal-interval, it seems difficult to obtain an intuitive interpretation.However, in many cases, the equal-interval scores seems to be used.)We now obtain the following theorem.
Theorem 1: The MH model holds if and only if both the ML and ME models hold.Proof.If the MH model holds, then both the ML and ME models hold.Therefore, we assume that both the ML and ME models hold, and then we show that the MH model holds.We have where .
Similarly, we have This yields Since the ML and ME models hold, we obtain . ., R, with at least one of the d k s being not equal to zero.Therefore we obtain ∆ = 0, i.e., the MH model holds.The proof is completed.

A decomposition of the MH model using the CML model
Consider a model defined by We shall refer to this model as the conditional marginal cumulative logistic (CML) model.The CML model indicates that given that an observation falls in one of the off-diagonal cells of the table, one conditional marginal distribution is a location shift of the other conditional marginal distribution on a logistic scale.Therefore, this model states that given that an observation falls in one off-diagonal cell, the odds that X is i or below instead of i + 1 or above, is exp(∆ * ) times higher than the odds that Y is i or below instead of i + 1 or above, for i = 1, . . ., R − 1.If ∆ * > 0, X rather than Y tends to be i or below instead of i + 1 or above.We now obtain the following theorem.
Theorem 2: The MH model holds if and only if both the CML and ME models hold.
We omit the proof because it can be obtained in a similar way as the proof of Theorem 1.
Let n ij denote the observed frequency in the ith row and jth column of the R × R table with n = n ij , and let m ij denote the corresponding expected frequency.We assume that {n ij } have a multinomial distribution.The maximum likelihood estimates (MLEs) of the expected frequencies under each model can be easily obtained using a Newton-Raphson method to solve the likelihood equation (see the Appendix, e.g., for the CML model).The numbers of degrees of freedom (df) for testing the goodness-of-fit of the MH, ML (CML), and ME models are R − 1, R − 2, and 1, respectively.

Examples
We analyze the data in Tables 1 to 3 using the decompositions of the MH model.Table 4 presents the values of the likelihood ratio statistic G 2 for the models applied to these data.
Table 4: Likelihood ratio values G 2 for models applied to the data in Tables 1, 2 and 3 * means significant at the 0.05 level.

Unaided vision data
When the MH model is applied to the data in Table 1, this model fits poorly, yielding G 2 (M H) = 11.99 with 3 df.However, the ML (CML) model, which is one of the decomposed models for the MH model, fits the data well where G 2 (M L) = 0.39 and G 2 (CM L) = 0.03 with 2 df.On the other hand, the ME model with g(k) = k, k = 1, 2, 3, 4, which is the other of the decomposed models, fits these data very poorly, yielding G 2 (M E) = 11.98 with 1 df.Since the ML (CML) model fits very well, we consider the hypothesis that the MH model holds under the assumption that the ML (CML) model holds, i.e., the hypothesis that ∆ = 0 (∆ * = 0) under the assumption, which is also equivalent to the hypothesis that the ME model holds, i.e., the mean of right eye grade is equal to the mean of left eye grade (i.e., E(X) = E(Y )), under the same assumption, by Theorem 1 (or Theorem 2).The difference between the G 2 values for the MH and ML (CML) models is and thus this hypothesis is rejected at the 0.05 significance level.This shows very strong evidence of ∆ = 0 (∆ * = 0), i.e., exp(∆) = 1 (exp(∆ * ) = 1) in the ML (CML) model, i.e., very strong evidence of E(X) = E(Y ) under the ML (CML) model.Therefore the ML (CML) model is preferable to the MH model for these data.
Under the ML model we get exp( ∆) = 1.06, or ∆ = 0.05 with a standard error of 0.02.Note that exp(∆) is estimated to be greater than 1 (from the test described above).
Thus, under the ML model the odds that a woman's right eye grade is i or below instead of i + 1 or above, for i = 1, 2, 3, is estimated to be 1.06 times higher than the odds that her left eye grade is i or below instead of i + 1 or above.Since exp( ∆) > 1, a woman's right eye grade rather than her left eye grade tends to be i or below instead of i + 1 or above.Thus, a woman's right eye tends to be better than her left eye.
Under the CML model the MLE is exp( ∆ * ) = 1.21, or ∆ * = 0.19 with a standard error of 0.06.Thus, from the CML model follows that given that a woman's right eye grade is different from her left eye grade, the odds that the woman's right eye grade is i or below instead of i + 1 or above is estimated to be 1.21 times higher than the odds that her left eye grade is i or below instead of i + 1 or above.Since exp( ∆ * ) > 1, the woman's right eye grade rather than her left eye grade tends to be i or below instead of i + 1 or above.Therefore, when a woman's right eye grade is different from her left eye grade, her right eye tends to be better than her left eye.

Ewe Data
Consider the data in Table 2.When the MH model is applied to these data, the model fits poorly and gives G 2 (M H) = 18.65 with 2 df.In addition, the ML (CML) model fits also poorly and yield G 2 (M L) = 18.55 and G 2 (CM L) = 18.59 with 1 df.However, the ME model with g(k) = k, k = 0, 1, 2, fits very well and gives G 2 (M E) = 0.07 with 1 df.Therefore, we consider the hypothesis that the MH model holds assuming that the ME model also holds.Because of this hypothesis is rejected at the 0.05 significance level, and hence the ME model is preferable to the MH model.
The ME model would provide that the mean for the number of lambs born to a ewe in 1953 is equal to the mean for the number in 1952.However, the distribution for the number of lambs in 1953 is different from the distribution for the number in 1952, because of the poor fit of the MH model.Note that under the ME model, the mean for the number of lambs born to a ewe in 1953 (in 1952) is estimated to be 0.65.

Danish Opinion Polls Data
From Table 4 we see that the MH model fits the data in Table 3 poorly.However, the ML (CML) model fits well and yields G 2 (M L) = 1.69 and G 2 (CM L) = 2.76 with 1 df.The ME model with g(k) = k, k = 1, 2, 3, fits poorly (see Table 4).
Consider the hypothesis that the MH model holds under the assumption that the ML (CML) model holds.Thus, we test the hypothesis that ∆ = 0 (∆ * = 0) under the assump- 73 with 1 df, we reject the hypothesis at the 0.05 level.This shows strong evidence of ∆ = 0 (∆ * = 0) in the ML (CML) model, i.e., evidence of E(X) = E(Y ).Therefore, the ML (CML) model is preferable to the MH model here.
Under the ML model we get exp( ∆) = 1.23, i.e., ∆ = 0.21 with a standard error of 0.08.Thus, the ML model provides that (1) the odds that the opinion of an individual is 'yes' instead of 'not yes (i.e., undecided or no)' is estimated to be 1.23 times higher in poll I than in poll II, and (2) the odds that the opinion of the individual is 'not no' instead of 'no' is 1.23 times higher in poll I than in poll II.The interpretation (2) may also be described as the odds that the opinion of the individual is 'no' instead of 'not no' is estimated to be 1.23 times higher in poll II than in poll I. Therefore, (i) the tendency that the opinion of an individual is 'yes' is stronger in poll I than in poll II, and (ii) the tendency that it is 'no' is stronger in poll II than in poll I.
Next, under the CML model we get exp( ∆ * ) = 1.76, or ∆ * = 0.56 with a standard error of 0.24.Thus, the CML model provides that (1) under the condition that the opinion in poll II of an individual changed from in poll I, the odds that the opinion is 'yes' instead of 'not yes' is estimated to be 1.76 times higher in poll I than in poll II, and (2) on the same condition, the odds that the opinion of the individual is 'no' instead of 'not no' is estimated to be 1.76 times higher in poll II than in poll I. Therefore, on condition the opinion in poll II of an individual changed from in poll I, (i) the tendency that the opinion of the individual is 'yes' is stronger in poll I than in poll II, and (ii) the tendency that it is 'no' is stronger in poll II than in poll I.

Concluding Remarks
The decompositions of the MH model is useful in order to see the reason for its poor fit.
Indeed, for the data in Table 1, the poor fit of the MH model is caused by the poor fit of the ME model rather than the ML (or CML) model, i.e., by the reason that the mean of a woman's right eye grade is different from the mean of the woman's left eye grade.Note that under the ML model, which fits these data very well, the MLE of the mean of the woman's right eye grade is 2.28 and that of the woman's left eye grade is 2.30.Under the CML model, which also fits very well, the MLE of the mean of the woman's right eye grade is 2.27 and that of the woman's left eye grade is 2.31.Conversely, for the data in Table 2, the poor fit of the MH model is caused by the poor fit of the ML (or CML) model rather than the ME model.Also, for the data in Table 3, the poor fit of the MH model is caused by the poor fit of the ME model rather than the ML or CML model.
The MH model as the CML and ME model they all do not depend on the probabilities {p ii } on the main diagonal of the table, but the ML model depends on them.Notice that the estimated expected frequencies on the main diagonal cells under the ML model are different from the observed frequencies on the main diagonal (see Tables 1 and 3).Thus, if the MH model does not hold and if we want to see the reason, especially, the reason why the equalities of the conditional row cumulative probabilities {F X(c) i } and the conditional column cumulative probabilities {F Y (c) i } do not hold, the analyst may be interested in inferring the structure of off-diagonal probabilities {p ij }, i = j and not the main diagonal probabilities {p ii }.In this case, the decomposition of the MH model into the CML and ME models may be preferable to that into the ML and ME models when the analyst wants to see the reason why the equalities of {F •i }, do not hold.However, the MH model indicates the equalities of the row cumulative probabilities {F X i } and the column cumulative probabilities {F Y i }, which include the probabilities {p ii } on the main diagonal.Therefore, if the MH model does not hold and if we want to see the reason why the equalities of {F X i } and {F Y i } do not hold, the analyst may also be interested in inferring the structure of {F X i } and {F Y i }.Then, the decomposition of the MH model into the ML and ME models may be preferable to that into the CML and ME models when the analyst wants to see the reason why the equalities of {F X i } and {F Y i }, or the equalities of {p i• } and {p •i }, do not hold, because each of the ML and ME models can be expressed as the function of {F X i } and {F Y i }.For the data in Table 3, the readers may be interested in seeing, given the opinion in poll II of an individual changed from in poll I, how the individual changed its attitude.Then the CML model rather than the ML model is useful.
The decompositions of the MH model described here should be considered for ordinal categorical data, because each of the decomposed models is not invariant under the same arbitrary permutations of the row and column categories (although the MH model is invariant under them).
The reader may also be interested in whether it is possible to decompose the test statistic for the MH model.For the decompositions of the MH model described here, it is not guaranteed that the test statistic for the MH model is asymptotically equivalent to the sum of the test statistics for the decomposed models, although the number of df for the MH model is equal to the sum of those for the decomposed models.However, the decomposition of the MH model would be useful for seeing the reason for the poor fit when the MH model fits the data poorly.
the equalities of {p c i• } and {p c

Table 1 :
Unaided distance vision of 7477 women aged 30-39 employed in Royal Ordnance factories from 1943 to 1946.The upper and lower parenthesized values are the MLEs of expected frequencies under the ML and CML models, respectively.

Table 2 :
Merino ewes according to number of lambs born in consecutive years.The parenthesized values are the MLEs of expected frequencies under the ME model.

Table 3 :
The results from two Danish polls on the question: Do you think Denmark should join the Common Market?The upper and lower parenthesized values are the MLEs of expected frequencies under the ML and CML models, respectively.
, and (2) from the decomposition of model M 1 , it is possible to see what the structure that model M 1 holds assuming that model M 2 holds indicates, i.e., that indicates the structure that model M 3 holds.The purpose of this paper is (1) to propose another ML model based on {F
*   Note, g(k)for the ME model are the equal-interval scores.