Performance and Robustness Analysis of Sequential Hypotheses Testing for Time Series with Trend

The problem of sequential testing of simple hypotheses for time series with a trend is considered. Analytic expressions and asymptotic expansions for error probabilities and expected numbers of observations are obtained. Robustness analysis is performed. Numerical results are given.


Introduction
The sequential approach to test parametric hypotheses proposed by Wald (see Wald (1947)) has been applied in many practical problems of computer data analysis.The sequential probability ratio test (SPRT) is proved to be optimal in terms of minimizing expected sample size under the assumption that type I and type II error probabilities do not exceed preassigned values (see Wald and Wolfowitz (1948)).The problem of sequential test performance characteristics (error probabilities and expected number of observations) evaluation is well studied for the case of identical distribution of observations (see Govindarajulu (2004), Kharin (2013), Kharin (2016)).In this paper, the model of non-identical distribution is considered for the problem of two simple hypotheses testing (see Kharin and Ton (2016)).
In practice, data does not often follow the hypothetical model exactly (see Huber (1981), Kharin (2005)), and the problem of robustness under distortions (see Kharin (1997), Maevskii and Kharin (2002)) is important for sequential testing (see Kharin (2011), Kharin and Kishylau (2015)).Here we consider the problem of robustness of sequential tests for time series with trend.
Consider two simple hypotheses (θ 0 , θ 1 ∈ R m are known vectors): (2) Denote the accumulated log-likelihood ratio statistic: where λ t = ln p t (x t , θ 1 ) p t (x t , θ 0 ) is the log-likelihood ratio calculated on observation x t , and p t (x, θ) is the probability density function of x t provided the true parameter value is θ.To test hypotheses (2), after n observations one makes the decision: Thresholds C − and C + are the parameters of the test.Decisions d = 0 and d = 1 mean stopping of the observation process and acceptance of H 0 or H 1 correspondently.According to Wald (1947), where α 0 , β 0 are given values for error probabilities of types I and II respectively.

Some auxiliary results
Let ζ n , n ≥ 1 be a sequence of random variables satisfying the following conditions: Remark 1.A Markov chain with a finite state space T , in which the states 0, 1 are absorbing, satisfies conditions (5)-( 7).
Introduce the notation: Since (6), matrices P (k) and P (1, k) can be expressed as follows: where R k , Rk are some matrices of size K × 2, I k is the identity matrix of size k, O 2×K is the 2 × K-matrix with all elements equal to 0, and Q k , Qk are some matrices of size K × K.For k < n < k + l we have: which implies that Therefore, From ( 8) and ( 10), we get (Q Let t be the total number of time moments for which process ζ n belongs to T 2 ; n j (j ∈ T 2 ) be the number of time moments for which ζ n = j; u k j be the function that is 1 if the process ζ n = j after k steps, and is 0 otherwise; E i (•) be the conditional expected value given ζ 1 = i.
Denote: N = +∞ k=0 Qk ; τ = N 1 K ; 1 K is the vector of size K with all elements equal to 1.
Theorem 1.In the above notation, for sequence (5)-( 7) the following equations are satisfied: Proof.Consider representation n j = +∞ k=0 u k j .Therefore, Qk ; , where b ij is the probability that the sequence ζ n started in i is absorbed in j, i ∈ T 2 , j ∈ T 1 .
Theorem 2. If conditions (5)-( 7) hold for ζ n , then , where b ij (k) is the probability that the process ζ n starting in i is absorbed in j after exactly k steps, i ∈ T 2 , j ∈ T 1 .We obtain (13) from the facts that B(k then the total expected value E(t) equals: From the properties of multivariate normal distributions (see Bilodeau and Brenner (1999)), we have: Performance and Robustness of Sequential Testing Lemma 1. (Gut 2005) If X is a non-negative, integer valued random variable, then Lemma 2. (Gut 2005) Let r > 0, and suppose that X is a non-negative random variable.
Then the following inequalities hold: Lemma 3. (Coope 1996) For positive semidefinite matrices A, B of the same order 0 ≤ tr(AB) ≤ tr(A)tr(B).

Performance analysis for the hypothetical model
For model ( 1), (3) we have (t ≥ 1): Due to the properties of the normal distribution, λ t and Λ n have also normal distributions with the following parameters: where 4) terminates finitely with probability 1.
Proof.We have: Under the theorem condition, we get s 2 n → +∞ as n → +∞.Furthermore, we also have Corollary 2. Under the conditions of Theorem 3 we get: where k is such an index that θ 1 k = θ 0 k .
Proof.Note that Γ and H n are positive semi-definite matrices.The proof is derived directly from Lemma 3 and the facts that: is bounded, then there exists a positive constant L such that s 2 n → L as n → +∞.In this case, we get σ n → 0 and µ In addition, we also have: Theorem 4.Under the conditions of Theorem 3 the following expressions are valid for the characteristics of test (3), ( 4): Performance and Robustness of Sequential Testing Proof.Under the condition of Theorem 4 the test terminates finitely with probability 1.From Lemma 1, we have: For the error type I probability we get: From ( 23) we get ( 21).The expression of β in ( 22) is proved analogously.
In practice, it is difficult to use formulae ( 20)-( 22) for computing the characteristics of the test: using numerical methods for approximating the multiple integration in the right hand sides of these equalities is unfeasible.To get upper bounds for these test characteristics, we can use the following estimate: where i is a fixed value in {1, 2, ..., n}.
It is obvious that the smaller value i, the stricter the inequality (24).In particular, when tr(ΓH n ) tends to +∞ slowly, we should select the value i smaller to get better estimates.
Time series data is usually collected at certain intervals.Sometimes, there are patterns that can repeat over fixed periods of time within the data set.Such patterns are known as periodic fluctuations or seasonality.In this case, the function of trend g(t) = θ T ψ(t) will be periodic with some period T > 0. In particular, function h(t) = (θ 0 − θ 1 ) T ψ(t) will also be periodic.
Theorem 5.If there exists an integer T ≥ 1 such that h(t + T ) = h(t), ∀t = 1, 2, ..., then the stopping time N has finite moments of any order.
Proof.Without loss of generality, assume that hypothesis H 0 is true.Due to the Theorem conditions, λ t and λ t+kT have the same distribution for all t, k ∈ N.
Corollary 4. If the basic vector function of trend ψ(t) is periodic on the set N, then the stopping time N has finite moments of any order.

Special case
Assume that there exists a constant a = 0 such that h(t) = a, ∀t ≥ 1, and H 0 is the true hypothesis.
In this case, {λ t , t ≥ 1} becomes the sequence of independent and identically distributed random variables from N (µ, σ 2 0 ), where a 2 σ 2 .Let x be a fixed value and put Λ x n = x + Λ n .The new test based on Λ x n is equivalent to the original SPRT whose region of indifference is the interval (C − − x, C + − x).Let β θ (x) and N θ (x) be the operating characteristic and average sample size functions of this SPRT respectively.From the Markov property of log-likelihood ratio statistic Λ n and when H 0 is true, β θ 0 (x) and N θ 0 (x) are known to satisfy the Fredholm integral equations (see Basseville and Nikiforov (1993), Cox and Miller (1965)): A numerical method is used for solving these equations.Let m 0 be a positive integer and in which y i having the smallest absolute value is set to be 0. Note that α = 1 − β θ 0 (0) and E(N |H 0 ) = N θ 0 (0).Let βθ 0 (y i ) be approximations of β θ 0 (y) at y = y i , i = 1, m 0 .Using the trapezoid formula (see Hoffman (2001)) we have: where In the case of uniform partition, the error of this approximation is O(h 2 ), where h = y i+1 − y i (see Hoffman (2001)).From the approximation above we get the following system of linear equations: (25) This system can be represented in the matrix form: where Similarly, we also have the system of linear equations for the approximations Ñθ 0 (y i ) of N θ 0 (y) at y = y i , i = 1, m 0 : This system is also represented in the matrix form: where Ñθ 0 = ( Ñθ 0 (y 1 ), ..., Ñθ 0 (y m 0 )) T and B is a column vector of size m 0 whose all elements are equal to one.

Approximation of the random sequence Λ n
Let us split the state space of Λ n on K + 2 cells: then for all i, j, k, i < j < k the following asymptotic expansions are valid: where Proof.From the Lemma condition and ( 17)-( 18), there exist two positive constants C 1 , C 2 such that: The rest part of proof is similar to the proof of Theorem 1 in Kharin (2008).
Proof.Under the conditions of Theorem 6 sequence Z n satisfies the conditions ( 5)-( 7) asymptotically as h → 0. The results of this theorem are derived from the Lemma 4, Theorems 1 and 2.

Robustness evaluation
In practice the observed data can often come from more complicated sources than hypothetical ones because of the distortion.Some contamination models (see Huber (1981)) can be used (1, 2, 2, 1) T , θ 1 = (1, 1, 1, 1) T ; hypotheses (2) were tested.Denote the sample estimate of a characteristic γ with Monte-Carlo method by γ.The number of simulation runs used in this method was 100 000.Denote Figure 1: Trend functions Suppose that hypothesis H 0 is true.Then we have a = (θ 0 − θ 1 ) T ψ(t) = 4, ∀t.This means that the lengths of segments AB and CD are always the same in all positions such that they are parallel to the vertical axis (figure 1a).In addition, λ t , t ≥ 1, are independent identically distributed random variables, λ t ∼ N (µ, σ 2 0 ), where µ = −50, σ 0 = 0.4.Monte-Carlo estimates (α and t0 ) and approximate values (α = 1−β θ 0 (0) and t0 = Ñθ 0 (0)) calculated according to Section 4.2 for Type I error probability α and conditional average number of observations t 0 respectively are presented in Table 1.When the value of m 0 increases, the approximate values of test characteristics tend to their corresponding Monte-Carlo estimates.The dependence of the operating characteristic and the average sample size functions on the initial value x in the modified test is presented in figure 2 for the case of m 0 = 200, α 0 = 0.05, β 0 = 0.1.Under hypothesis H 0 , β θ 0 (x) is a decreasing function with respect to x (figure 2a).This fact is easily understood because when x increases, the probability that x + Λ n comes out of the interval (C − − x, C + − x) through the upper boundary C + − x also increases.However, function N θ 0 (x) increases to the maximum value in the interval (C − , C + ) before dropping (figure 2b). (a) Figure 2: Plots of functions 1 − β θ 0 (x) and N θ 0 (x) In the next examples, the following values of parameters are used for calculating: m = 4, σ = 10, ψ(t) = (1, t/10, t 2 /100, 10/t) T , θ 0 = (1, 2, 2, 2) T , θ 1 = (1, 1, 1, 1) T .In this case, P k (Λ 50 ∈ (C − , C + )) ≤ 10 −6 , k ∈ {0, 1}.All infinite sums were replaced by the sums of the first 50 summands, this provides the accuracy of the order 0.00001.Figure 1b shows the plots of trend functions.The upper bounds for the test performance characteristics constructed in Corollary 3 are given in Tables 2 and 3.   2 and  3).The orders of approximation in ( 28)-( 30) are only O(h).Therefore, if we want to make the main terms of asymptotic expansions better, the value K must be larger.However, with the large value K the computation on infinite sums (in practice, they can be reasonably replaced with finite ones because of the termination of the test) of matrices with high dimensions S(θ i ) and B(θ i ) will cost much time.
Figure 3 shows the dependence of the error probabilities and average number of observations on the probability of contamination ε in the model (31), when σ2 = 50σ 2 , α 0 = 0.001, β 0 = 0.005.When contamination probability ε increases, both error probabilities increase.For both conditional average numbers of observations, there are opposite pictures.

Conclusion
The problem of sequential testing for time series with trend is studied.The sufficient condition of termination of the test is given.Beside the explicit (but not useful for further analysis) formulae for the test characteristics, an approach to approximate test characteristics is also constructed.This approach allows us not only to estimate the test characteristics, but also to analyze the robustness of the test.

Figure 3 :
Figure 3: Dependence of performance characteristics on probability of contamination ε

Table 1 :
Performance characteristics estimates

Table 2 :
The upper bounds for error probabilities

Table 3 :
The upper bounds for the average number of observations ASY M (N ).The numerical results for these main terms are presented in Table4.

Table 4 :
The main terms of asymptotic expansions of the test characteristics