Exact Computation of Pearson Statistics Distribution and Some Experimental Results

The Pearson statistics for multinomial scheme and its modifications is used by different goodness-of-fit criteria. As a rule the selection of critical values for Pearson statistics is based on the convergence of its distribution to the χ distribution with appropriate degrees of freedom as sample size tends to infinity. In practice sample sizes are bounded, and the question on the accuracy of such approximation (especially for distribution tails) arises naturally. Results of investigation of this problem was reported, in particular, in Holzman and Good (1986) where more than 250 examples of equiprobable multinomial scheme with N ∈ [2, 160] outcomes and sample sizes T ∈ [10, 80] were considered. To compute the distribution function of Pearson statistics Holzman and Good (1986) have used generating functions and Good, Gover, and Mitchell (1970) – Fast Fourier Transform. Computational method for decomposable statistics distribution (also based on generating functions) was proposed in Selivanov (2006). We propose to compute the Pearson statistics distribution by means of the Markov chain method suggested in Zubkov (1996, 2002); this method may be applied to distributions of decomposable statistics for multinomial and some other schemes also.


Introduction
The Pearson statistics for multinomial scheme and its modifications is used by different goodness-of-fit criteria.As a rule the selection of critical values for Pearson statistics is based on the convergence of its distribution to the χ 2 distribution with appropriate degrees of freedom as sample size tends to infinity.In practice sample sizes are bounded, and the question on the accuracy of such approximation (especially for distribution tails) arises naturally.Results of investigation of this problem was reported, in particular, in Holzman and Good (1986) where more than 250 examples of equiprobable multinomial scheme with N ∈ [2,160] outcomes and sample sizes T ∈ [10,80] were considered.To compute the distribution function of Pearson statistics Holzman and Good (1986) have used generating functions and Good, Gover, and Mitchell (1970) -Fast Fourier Transform.Computational method for decomposable statistics distribution (also based on generating functions) was proposed in Selivanov (2006).We propose to compute the Pearson statistics distribution by means of the Markov chain method suggested in Zubkov (1996Zubkov ( , 2002)); this method may be applied to distributions of decomposable statistics for multinomial and some other schemes also.

Method
Let ν 1 , . . ., ν N be frequencies of N outcomes with probabilities p 1 , . . ., p N in a multinomial sample of size T .Random variables of the form ζ = N j=1 f j (ν j ), where f 1 (x), . . ., f N (x) are given functions, are called decomposable statistics.In the case f j (x) = (x − T p j ) 2 /T p j , j = 1, . . ., N , we obtain the Pearson statistics If the hypothetical probabilities p j = m j /n j , j = 1, . . ., N , are rational then the formula for the Pearson statistics may be rewritten as where M = LCM(m 1 , . . ., m N ), N = LCM(n 1 , . . ., n N ).
If all hypothetical probabilities are equal (p 1 = • • • = p N = 1/N ) then the formula for the Pearson statistics may be represented in another form as where x = [x + 1/2] denotes the nearest integer to x. Formulas (2) and (3) reduce the computation of the Pearson statistics distribution to the one of integer-valued decomposable statistics.Exact distributions of integer-valued random variables may be stored as tables in a computer memory.Further, the conditional distribution of the frequency ν t on the set . ., N , may be considered as a timeinhomogeneous Markov chain with state space {0, 1, . . ., T } and transition probabilities (4) So the sequences ) with transition probabilities , and the distribution of , and the distribution of 1/(T M N )ζ N,2 coincides with that of X 2 N,T from (2).It follows that we may find the distribution of X 2 N,T by means of a recursive computation of the distributions of ζ * k (or ζ k ), k = 1, . . ., N .Note that probabilities p 1 , . . ., p N in the definition (4) of the chain κ t transition probabilities need not be equal to the hypothetical probabilities in the formula for Pearson statistics.In other words, this approach may be applied to the computation of Pearson statistics distribution under alternative hypotheses also.
In the case of arbitrary probabilities p 1 , . . ., p N we may use the same ideas to compute approximate distribution function of X 2 N,T by means of discretization.To this end it suffice to introduce functions h ε (x) = [1/2 + x/ε] (where ε > 0, [y] denotes integer part of y) and consider the discrete Markov chain with transition probabilities where p(v|u) may be defined by probabilities not necessarily coinciding with p 1 , . . ., p N .Let, as earlier, Reducing the value of ε we may find arbitrary good estimates for P X 2 N,T ≤ x .The volume of memory used is inversely proportional to ε.
This method was realized by several C++ programs.In particular, for equiprobable schemes the Pearson statistics distributions with the number of outcomes up to hundreds and with the number of trials up to thousands was computed.Computations on PC takes from seconds if number of outcomes and trials are less that 50 to minutes if these numbers are of the order of several hundreds.Time and memory requirements of a program realizing algorithm for rational probabilities depend heavily on their arithmetical structure.Time and memory requirements of a program for approximate computation are analogous to that of a program for equiprobable case.

Experimental Results
Computational experiments reveal some interesting features of the difference between exact Pearson statistics distributions and corresponding chi-square distributions.
The equiprobable case with N = 10, T = 10 is used in Figure 1 to explain the structure of all subsequent pictures.There are a piecewise-constant distribution function of exact Pearson statistics, a continuous distribution function of χ 2 distribution with 9 degrees of freedom, a discontinuous saw-like difference between two preceding functions and piecewise linear continuous function connecting mean values of the difference at Austrian Journal of Statistics, Vol. 37 (2008), No. 1, 129-135 discontinuity points ("average difference") in the left part of Figure 1.In the following we consider plots of differences and average differences only (as in the right part of this figure).In Figure 2 we plot differences and average differences for equiprobable case with N = 10 outcomes and T = 100, T = 1000 trials.Note that the shapes of plots are almost independent on T .The ranges of graphs are approximately inversely proportional to T .The shapes of average differences have a form of fading wave; the sign of average difference becomes negative on the right tail of distribution (after x ≈ 25), but eventually it becomes positive again (due to the boundedness of the Pearson statistics distribution).Plots of the differences for equiprobable cases with constant values of ratios T /N = 10 and T /N = 2 are shown in Figures 3 and 4, respectively.We can see that the shapes of the differences slightly depend on N , that the shape of the average differences are more stable and that the ranges of the graphs are approximately inversely proportional to the square root of the number of outcomes.So large number of outcomes may compensate an insufficient number of trials even when the quotient T /N is as small as 2 (Figure 4).As long as the distribution on the set of outcomes becomes more non-uniform the shape of plots of differences approaches the shape of plots of average differences, namely, the shape of fading wave with two minima and one maximum (see Figures 5, 6).Values of these extremes are comparable with the extremum values of average differences for corresponding equiprobable cases.
The reason of shrinking the differences to the average differences when the distribution of the outcomes goes away from a uniform one may be explained (on the heuristic level) as follows.N,T < x} is approximated by the distribution function of a χ 2 distribution with N − 1 degrees of freedom, so the smaller the value T M the greater the weights of atoms of distribution X 2 N,T , i.e. jumps of piecewise-constant distribution function F N,T (x) (in particular, the largest atoms appear in the equiprobable case).Therefore, the accuracy of approximation F N,T (x) by distribution function of χ 2 statistics with N − 1 degrees of freedom cannot be good for small values of T M .
We hope to find quantitative theoretical explanation of effects described.