Robust Sequential Testing of Hypotheses on Discrete Probability Distributions

The paper is devoted to analysis of approximations of the models in sequential hypotheses testing and to construction of new robust sequential test under the distortions presented by mixtures of probability distributions. The previous results of the authors are extended to the case of arbitrary discrete probability distributions. The theory is illustrated by computer modelling.


Introduction
In applications, especially in medicine (Whitehead, 1997), statistical quality control (Mason and Young, 2002), biology (Durbin, 1998) and finance (Lai, 2001), sequential methods of hypotheses testing (see Wald, 1947;Siegmund, 1985) are used quite often.The usage of this type of statistical procedures for such applications follows naturally from the data character (see, for example Bauer and Röhmel, 1995).
Sequential procedures are applied in practice to the data which does not follow exactly a considered hypothetical model (Huber, 1981).Usually a high percentage of the data fits a hypothetical model, and within this model assumptions the sequential test could be optimal.But a small part of data does not follow the hypothetical model -the hypothetical model is distorted (see Rieder, 1994).This leads often to loose of optimality of the test.That is why the robustness analysis for a sequential test used should be performed, and the robust tests need to be constructed to treat the data under distortions.
Some results on robustness analysis for sequential testing of hypotheses on data from continuous distributions are presented in (Quang, 1985).Our previous results are related to the special case of discrete model of observations (see Kharin, 2002).In this paper, we consider the case of arbitrary discrete probability distributions with finite sets of values.

Mathematical Model, Sequential Probability Ratio Test, Distortions
Let independent discrete random variables x 1 , x 2 , . . .be defined on a measurable space (Ω, F), Let these random variables be identically distributed: where θ ∈ Θ = {0, 1} is an unobservable parameter.There are two simple hypotheses on the parameter θ: Austrian Journal of Statistics, Vol. 34 (2005), No. 2, 153-162 Denote the accumulated likelihood ratio statistic: where To test the hypotheses (2) after n (n = 1, 2, . ..) observations, the decision is made according to the sequential probability ratio test (SPRT), (see Siegmund, 1985).By 1 D (•) we denote the indicator function of the set D. The decisions d n = 0 and d n = 1 mean stopping of the observation process and acceptance of the correspondent hypothesis.
The decision d n = 2 means that the (n + 1)-th observation is to be made.The thresholds Consider the model of distortions which will be analysed in Section 4. Let the hypothetical model be under contamination of the Tukey-Huber type, (see Huber, 1981).This means that instead of (1) the observations x 1 , x 2 , . . .come from a mixtured discrete probability distribution where ε k ∈ [0, ε k+ ], k = 0, 1, are unknown probabilities of contamination, Pk (u) is an arbitrary contaminating probability distribution, Pk (•) = P k (•).

Evaluation of the Characteristics of the Test
In Kharin (2002) explicit expressions for the characteristics of the test (4) are given under additional assumption on the function λ W (•): Under this assumption the thresholds C − , C + can be replaced with a C − /a and a C − /a respectively without changes in the test (4).As long as ( 6) is satisfied, assume that C− = C − /a, C+ = C + /a ∈ Z.For k = 0, 1 introduce the notation: where δ i,j is the Kroneker symbol, 1(•) is the unit step function.Define the matrices k) , k = 0, 1.We denote the i-th column of the matrix B (k) as B (k) (i) .Let 1 N −2 be the vector column of size (N − 2) all elements of which are equal to 1.
The next result gives the explicit expressions for the error probabilities of type I and II denoted by α, β, and conditional expected sequence sizes (ESSs) t (k) (if the hypothesis H k is true), k = 0, 1, for the test (4) under the assumption (6).
Theorem 1 (Kharin, 2002) If under conditions ( 1), ( 2), ( 6) |S (k) | = 0, then the characteristics of the test ( 4) have the following expressions: If the assumption ( 6) is broken, we can use the so-called approximated test to get estimates of the conditional error probabilities and the conditional ESSs of the test (4).We construct a function λ : U → R that satisfies (6) and approximates the function where δ > 0 is a parameter of approximation.As it is shown in (Kharin, 2004), the function λ satisfying ( 8), ( 6) can always be constructed.
As λ(•) satisfies ( 6), Theorem 1 holds, and we get the conditional error probabilities and the ESSs of the test based on the function λ(•).With δ being small enough these values approximate the characteristics of the SPRT.Choosing λ(•) in a proper way we get upper and lower bounds of these characteristics.

Theorem 2 If the function λ(•) in the test (4), (3) is bounded by the functions λ(•), λ(•) :
where Proof.From the sequential test construction it follows that τ 0 = τ 1 .First prove the inequality α ≤ α.By the definition, we have The second inequality for α and inequalities (10) for β are proved in the same way.
From the total probability formula we get From the definition of the stopping times τ k , τ k , τ k , k = 0, 1, it follows that and From the definition of the conditional error probabilities we get Let us analyse summands in ( 13): Using ( 14), (15) we get Because of the finiteness of the integral (see Wald, 1947) and because of the "shrinking" probabilities (15) at δ → 0, we have from ( 16): An upper bound for the second summand in ( 13) is constructed in the same way: Finally, Considering ( 17) and ( 18) together, we come to the upper bound (11) for t (0) .The lower bound for t (0) , and both bounds for t (1) are proved with the same scheme.
Theorem 3 If the conditions of Theorem 2 hold, then the following inequalities are satisfied: Proof.Analyse the first summand in ( 13): Considering the second summand in (13), we get: From ( 13), ( 20), ( 21) we get the first inequality of (19).The second inequality is proved in the same way.
The upper bounds ( 19) can be calculated using the theory of denumerable Markov chains (Kemeni et al., 1966).Although the result of Theorem 3 is not asymptotic, we recommend to use in practice the main terms (12) of the asymptotic expansions (11), because the inequalities in (19) are "rough".
The next result gives the explicit expressions for the conditional ESSs under assumption (6).
Theorem 4 If under conditions (1), ( 2), ( 6) |S (k) | = 0, k = 0, 1, then for the test (4) Proof.Define a random sequence Then ξ n is a homogeneous Markov chain with N states, two of them, C− and C+ , being absorbing ones.If the hypothesis H k , k = 0, 1, is true, the matrix of transition probabilities of ξ n is given by where I 2 is the (2×2)-identity matrix and 0 2×N −2 is (2×N −2)-matrix with all elements equal to 0. The vector of initial probabilities of nonabsorbing states of ξ n equals to π (k)  and initial probabilities of absorbing states equal to π C+ .The rest of the proof follows from the finite Markov chains theory (Kemeni and Snell, 1959).
We also can use the results of Theorems 2, 4 to evaluate the accuracy of existing estimates obtained via approximation of λ W (•). Suppose we have already the approximations α, β of the error probabilities obtained using the function λ(•) that satisfies (6) and approximates λ W (•) with an accuracy δ.Considering (6), let us denote m * = max{m ∈ Z : a/m ≥ δ}, a δ = a/m * , and define the functions λ(•), λ(•): Note that the functions ( 23) satisfy ( 6) with a = a δ .
Corollary 1 If functions λ(•), λ(•) are defined by ( 23), then the following inequalities hold for the two tests of the type (4) based on these functions: The second inequality is proved similarly.Let us note that the asymptotics α − α → 0, β − β → 0 is reached by taking δ → 0.
The accuracy of the estimates of the conditional ESSs can be evaluated in the same way.

Minimax Robust Sequential Test
A family of modified sequential tests is proposed to robustify (4) in (Kharin, 2002): where and g − , g + ∈ R, g − < g + , are parameters of ( 24).The minimax robust test is defined as the solution of the extremal problem: where C > 1 is a parameter, w 0 , w 1 ≥ 0 are losses caused by the errors of type I and II respectively, π 0 , π 1 are known prior probabilities of the hypotheses, ) are the error probabilities and conditional ESS of the test (24) under the least favorable contaminating distribution, that is given in (Kharin, 2002).Note that the test ( 24) can be treated as the test (4) based on λ(u) = g(λ W (u)). Hence, if λ(•) satisfies ( 6), Theorem 1 holds and the problem (25) can be solved numerically by iterating through all possible values of g − , g + .

Numerical Results
To illustrate the theoretical results we performed computer modelling.The case of the observed sequence of random vectors was considered.
The hypotheses H 0 , H 1 were formed by the expressions: The thresholds C − and C + for the test (4) were calculated using the Wald formulae: where α 0 and β 0 are the so-called "desired" conditional error probabilities (they are maximal possible values of the error probabilities of type I and II).
The results are presented in Tables 3 and 4. For comparison, the estimates of the SPRT (4) characteristics are also given in the tables.The results show the robustness of the test (24).

Table 1 :
Conditional error probabilities for the hypothetical model

Table 2 :
Conditional ESSs for the hypothetical model

Table 4 :
Conditional ESSs for the distorted model