Robust Test for Detecting Changes in the Autocovariance Function of a Time Series

We propose a new robust test to detect changes in the autocovariance function of a time series. The test is based on empirical autocovariances of a robust transformation of the original time series. Because of the transformation, we do not require any ﬁnite moments of the original time series, making the test especially suitable for heavy tailed time series. We furthermore propose a lag weighting scheme, which puts emphasis on changes of the autocovariance at smaller lags. Our approach is compared to existing ones in some simulations.


Introduction
Testing for second order stationarity goes back to Quenouille (1958) and Jenkins (1961). To the best of our knowledge, the first test where the possible time of change is not known a priori can be found in Priestley and Rao (1969). Since then a lot of alternatives have been proposed. In Jin, Wang, and Wang (2015) estimated autocovariances of subsamples are compared to the estimation based on the whole time series. Several tests have been proposed for linear models, see Bai (1993), Bai (1994), Andrews (1993), Davis, Huang, and Yao (1995), and Akashi, Dette, and Liu (2018). CUSUM-type tests to detect changes in one or several autocovariances have been derived in Berkes, Gombay, and Horváth (2009), Lee, Ha, Na, and Na (2003), and Dette, Wu, and Zhou (2015). A test based on the auto-copula has been proposed in Bücher, Fermanian, and Kojadinovic (2019). Tests that check stationarity of the spectrum are presented in Picard (1985), Giraitis and Leipus (1992), and Rozenholc (2001) and a wavelet periodogram is used in Nason (2013) and Cardinali and Nason (2018). There are also proposals that compare local estimates of the spectrum with a global estimation; see Von Sachs and Neumann (2000), Paparoditis (2009), Paparoditis (2010) Dette, Preuß, and Vetter (2011), and Preuß, Vetter, and Dette (2013). Surprisingly little attention has been paid to robustness. We want to fill this gap with a CUSUM type test based on robustified autocovariances. The testing procedure is described in Section 2. A simulation study in Section 3 indicates its usefulness.

Testing procedure
Denote by X = X 1 , . . . , X T a one dimensional time series which is stationary under the nullhypothesis. We assume in the following that X has a continuous marginal distribution and is strongly mixing with mixing coefficients (a k ) k∈N fulfilling a k = O(k −3− ) for some > 0. Strong mixing was first introduced in Rosenblatt (1956) and describes how fast the dependence between two observations decreases as the time lag between them increases. See Bradley (2005) for more details. We only want to emphasize here that a broad class of time series models is strongly mixing, like linear and GARCH processes with continuously distributed innovations; see Chanda (1974) and Lindner (2009). We want to test, whether the autocovariance function of X stays the same, concentrating on the first p lags. We follow the approach of Dürre and Fried (2019) and use bounded transformations. Before using them, the observations need to be properly standardized. Denote byμ the sample median and byσ the sample MAD of X, and µ and σ their theoretical counterparts. Then we definê denotes the Huber-ψ function. This function was originally introduced for location estimation in Huber (1964) and basically downweights the influence of observations with large absolute values by shrinking them to more plausible values; see Figure 1. The tuning-coefficient k determines the robustness of the test. A larger value of k is favourable under Gaussian time series, whereas a smaller k is needed if the data is corrupted or heavy tailed. In Huber (1981) k = 1.5 is recommended as a compromise. In the following, we derive a CUSUM type test based on the Huber-transformed time series. Denote S (l) k = k t=1Ŷ tŶt+l , then we look at Here are some remarks with regard to R T : • Technically, R T tests the following hypothesis H 1 : ∃k < T : This is not equivalent to a test for a stationary autocovariance function up to lag p. For example, R T will have problems to detect changes in the tail dependence since extreme values are down weighted by ψ.
• The choice of p is crucial for the power of the test. If there is only a change in the first lag, a large p only adds noise and can mask the change point. On the other hand, if one chooses p small, one cannot detect changes in the higher lags. Furthermore, one has to keep in mind that the estimation of Σ gets very poor if p is large compared toT . As a rule of thumb, use p < T /20 .
• In the multivariate context, it is common and beneficial to use the quadratic form with respect to Σ, the asymptotic long run covariance matrix of S T . R T gets affine invariant in this case. In the time series context, this property is not desirable. The weights w 0 , . . . , w p give us more flexibility. Usually one would choose descending weights to smooth the transition between lags of the autocovariance function where one can detect a change j = 0, . . . , p to those neglected j > p. If there is only a change in the first autocovariance and one chooses p large, the change could be masked by the noise from the other autocovariances. Descending weights somehow counteract this problem. Without further knowledge, we suggest using w i = 1 − i/p for i = 0, . . . , p. If one uses weights instead of Σ, the asymptotic distribution of R T depends on the actual dependence structure of X. Therefore, one cannot use tabulated asymptotical critical values. However, one can approximate the distribution of R T by sampling Gaussian processes with the estimated covariance structure. Now we describe how one can approximate the distribution of R T under the null-hypothesis. Under the above assumptions one can use Theorem 1 of Dürre and Fried (2019). It is not explicitly stated there but effectively proved in Proposition 1 and 2 that Here, Σ is the asymptotic long run covariance matrix defined by Proposition 3 in Dürre and Fried (2019) states that Σ can be consistently estimated by a kernel estimator. Denote therefore bT ≥ 0 a bandwidth and k : R → [−1, 1] a kernel function. ThenΣ with the elementŝ is the related kernel estimator. Simulations indicate that the flat-top kernel proposed in Politis and Romano (1993) works well together with b T =T 1 3 under autoregressive processes of order 1. One can generate random variablesR (i) T , i = 1, . . . , m that have asymptotically the same distribution as R T under the null-hypothesis by the following algorithm: • generate (p + 1) ·T independent standard normal random variables and store them in aT × (p + 1) matrix Z • reproduce the cross sectional dependence by multiplying Z with L of the Cholesky decompositionΣ = LL T : set V = Z · L • calculate the weighted test statistic By this algorithm, one can generate random variables to calculate approximate p-values very fast. We recommend using a modified Cholesky decomposition to safeguard against numeric instabilities which could arise especially if T is small compared to p. We use the algorithm proposed in Schnabel and Eskow (1999) in our simulations.

Simulations
We assess our approach in a simulation study. We compare our method with tests for second order stationarity which are available in R (R Core Team 2019). Two wavelet based tests proposed in Nason (2013) and Cho (2016) are implemented in the packages locits (Nason 2016) respectively unsystation (Cho 2018). A revised version of the ANOVA test originally proposed in Priestley and Rao (1969) is implemented in the package factal (Constantine and Percival 2017). We abbreviate these tests by Wav, Rpar and Anova. These methods can cope with multiple break points, so we expect them to have inferior power in the one change-point setting. A copula based method (Bücher et al. 2019) is implemented in the package npcp (Kojadinovic 2019). This test can detect quite general changes in the dependence structure up to lag p but no changes in the marginal variance. We choose p ∈ {3, 5} and abbreviate the tests as Copula3 and Copula5. Furthermore, we compare with tests that can detect changes in autoregressive processes: a likelihood based (Davis et al. 1995) and a quantile based (Qu 2008). Both methods require the order p of the AR process. We choose p ∈ {1, 3, 5} and denote the likelihood based tests by AR1, AR3, AR5 and the quantile based tests by Quan1, Quan3, Quan5. A test for a change in the spectrum (Picard 1985) is abbreviated as spec.
Our proposed methods depend on the maximal considered lag p ∈ {3, 5} and the robustness parameter k ∈ {1.5, 1000}. The former choice of k is robust and the test is abbreviated as HCov. The later choice is non-robust and similar to a covariance based test proposed in Berkes et al. (2009). The test is denoted by Cov. Furthermore, we compare different weights: linear decreasing weights w i = 1 − i/p, i = 0, . . . , p are indicated by d, constants weights w i = 1, i = 0, . . . , p by e and weights generated by the estimated asymptotic long run covariance matrixΣ by s. In this notation Cov5d denotes the non robust test with k = 1000, p = 5 and decreasing weights. We evaluate the behaviour under the null hypotheses. We look at AR(1) models: X t = ρX t−1 + t for t = 1, . . . , T with ρ ∈ {0, 0.8}, different distributions for ( t ) t=1,...,T , namely the standard normal and a t-distribution with 3 degrees of freedom, and different T ∈ {128, 256, 512}. Results based on 10000 repetitions are summarized in Table 1. Only Cov, HCov with decreasing and equal weights, Copula, and Quan give reliable results in all cases. Wav and Rpar are strongly oversized under t 3 innovations, spec is additionally anticonservative for large ρ. Anova needs very large sample sizes. Furthermore one can observe that Cov and HCov are conservative for small T in case of equal or decreasing weights. The weights based onΣ lead to strongly anti-conservative tests if T is small and ρ is large. To asses power under H 1 , we first look at autoregressive models of order one: X t = t , t = 1, . . . , 128 ρX t−1 + t , t = 129, . . . , 256 for ρ ∈ {0, 0.05, . . . , 0.55, 0.6} and t ∼ N (0, 1) for t ∈ Z. The autocovariance function changes from ρ(k) = 0 to ρ(k) = ρ k /(1 − ρ 2 ) for k ∈ N. Results based on 10000 repetitions can be seen in Figure 2. In the first line on the left, we see different versions of our proposed method. The tests with decreasing weights clearly outperform the others. Surprisingly, the robust tests have higher power than the non robust ones. Furthermore, we notice that the test considering 5 lags does not lose much against the one that only considers 3 lags. On the right, we see different versions of AR, Copula, and Quan and notice that Copula3 clearly outperforms Copula5. Furthermore, Quan1 has considerably larger power than Quan3 and Quan5 whereas AR3 and especially AR5 have large size distortions under H 0 . We choose the versions that behave best and compare them in the second row of Figure 2. There we see that Copula3 has the largest power for small ρ and AR1 for larger ρ. The former is quite remarkable since copula based tests have power against arbitrary changes of the dependence structure. Our test HCov3d lags some power compared to Copula3 and Quan1 which are also robust to some degree. Now we evaluate the behaviour under heavy tails. We look at where the innovations t are t-distributed with different degrees of freedom df ∈ {1, . . . , 10}. Results are based on 10000 repetitions. In the first line of Figure 3 on the left, one can see that HCov3d has the largest power. Usually HCov gains power under more heavy tailed innovations which corresponds to small df , and Cov loses power. The exemption is Covs, which is anticonservative under heavy tails. In Figure 3 on the right, we see that Copula3 considerably outperforms Copula5, whereas Quan1 has larger power than Quan3 and Quan5. We decided against adding the results of AR3 and AR5 since they are strongly anti-conservative under heavy tails. We compare the best tests at the bottom of Figure 3 and notice that Quan1 has the largest power apart from df = 1. In this case HCov3d is most powerful. Finally we want to evaluate the influence of the time-lag where the autocovariance function where t ∼ N (0, 1). The autocovariance function changes from γ(k) = I {k=0} to γ(k) = 1.8I {k=0} + 0.8I {k=r,r =0} for k ∈ N 0 and r = 0, . . . , 10. Note that r = 0 corresponds to a change of the marginal variance. Results based on 10000 repetitions can be seen in Figure 4. In the first line on the left, we see different versions of HCov and Cov. It is not clear which one behaves best overall. All tests have the largest power if there is only a shift in the variance (r = 0). Furthermore, they have power > 0.05 for all r since there is always a change in the marginal variance. The power of tests with decreasing weights decrease as expected, whereas the power of the tests with equal weights increase slightly from k = 1 to k = p. Usually Cov has larger power than HCov for k = 0 and k = p + 1, . . . , 10. On the right, we compare different versions of Copula, AR, and Quan. Copula3 has uniformly larger power than Copula5, AR3 outperforms AR1, and Quan5 dominates Quan3 and Quan1. At the bottom of Figure 4, we compare the best tests and notice that spec behaves best overall. Anova, Copula, Quan, and AR have problems to detect changes in the marginal variance (r = 0) and Quan, Anova, Copula, and Wav have problems if r is large. We conclude that no test dominates the others in all considered situations. Our proposalHCov behaves well in all of them. It is the only test which can cope with heavy tailed distributions and can detect changes in the marginal variance simultaneously. We notice that decreasing weights yield good results, especially under autoregressive processes. The choice of p influences the power of the test. It seems that choosing p too large is better than choosing it too small. In all simulations, a misspecification of p is not as fatal as a misspecification of similar tuning parameters of the competing methods AR, Quan, and Copula.