Discriminating between Long Memory and Volatility Shifts

We develop a practical implementation of the test proposed in Berkes, Horváth, Kokoszka, and Shao (2006) designed to distinguish between a change-point model and a long memory model. Our implementation is calibrated to distinguish between a shift in volatility of returns and long memory in squared returns. It uses a kernel estimator of the long-run variance of squared returns with the maximal lag selected by a data driven procedure which depends on the sample size, the location of the estimated change point and the direction of the apparent volatility shift (increase versus decrease). In a simulations study, we also consider other long-run variance estimators, including the VARHAC estimator, but we find that they lead to tests with inferior performance. Applied to returns on indexes and individual stocks, our test indicates that even for the same asset, a change-point model may be preferable for a certain period of time, whereas there is evidence of long memory in another period of time. Generally there is stronger evidence for long memory in the eight years ending June 2006 than in the eight years starting January 1992. This pattern is most pronounced for US stock indexes and shares in the US financial sector. Zusammenfassung: Wir entwickeln die praktische Umsetzung des in Berkes et al. (2006) vorgeschlagen Tests zur Unterscheidung zwischen einem ChangePoint und einem Long Memory Modell. Unsere Implementierung ist darauf abgestimmt, zwischen einer Verschiebung in der Volatilität von Returns und Long Memory in quadrierten Returns zu unterscheiden und verwendet einen Kernschätzer der Long-Run Varianz der quadrierten Returns. Dabei wird der maximale Lag durch eine datengesteuerte Prozedur gewählt, die vom Stichprobenumfang, der Lage des geschätzten Change-Points und der Richtung der anscheinenden Volatilitätsverschiebung (Zunahme gegen Abnahme) abhängt. In einer Simulationsstudie betrachten wir auch andere Long-Run Varianz Schätzer wie den VARHAC Schätzer, aber wir erkennen, dass diese zu Tests mit schlechterem Verhalten führen. Angewandt auf Returns von Indizes und individuellen Aktien zeigt unser Test, dass sogar für dieselbe Anlage für eine bestimmte Zeitperiode ein Change-Point Modell vorgezogen werden kann, während es Hinweise auf long memory in einer anderen Periode gibt. Im Allgemeinen gibt es stärkere Hinweise für long memory in den acht Jahren vor dem Juni 2006 als in den acht Jahren nach dem Jänner 1992. Dieses Muster ist am stärksten für die US Aktienindizes und Beteiligungen im US Finanzsektor ausgeprägt.


Introduction
Long memory, or long-range dependent, stochastic processes have over the last two decades been extensively used in modeling time series arising in areas as diverse as geophysics, most notably hydrology and climatology, engineering, medicine, and computer networks.The 1980's saw applications of long memory processes in modeling macroeconomic time series, whereas the 1990's witnessed an increased interest in modeling the volatility of returns on speculative assets by such processes, see Henry and Zaffaroni (2002) for a discussion and relevant references.Doukhan, Oppenheim, and Taqqu (2002) provide a recent extensive review of the theory and applications of long-range dependent models.
A stationary long memory stochastic process exhibits persistent aperiodic cycles which, in finite samples, can create an appearance of changes in the mean level, or in variability, if long memory in squares is of interest.It has recently been argued that the empirical evidence for long memory can be attributed to the presence of trends or structural breaks in the data.A slowly decaying autocovariance function and a spectral density which follows a line near the origin on the log-log scale are the simplest manifestations of long memory.It is by now well documented that models incorporating various forms of nonstationarity also exhibit these and other features of stationary long memory models.The last few years have seen an intensified debate over which modeling approach is more appropriate.Bhattacharya, Gupta, and Waymire (1983) and Giraitis, Kokoszka, and Leipus (2001) showed that statistics computed from short memory processes perturbed by trends or shifts in the mean may exhibit the same properties as those of long-range dependent processes.The confusion between long memory and change-points is also reflected by the fact that tests for long memory typically reject in the presence of change-points and many change-point tests reject in the presence of long memory, see Hidalgo and Robinson (1996) and Krämer and Sibbertsen (2000) for detailed examples and discussion.It is thus seen that despite their different mathematical formulations, long-range dependent processes and structural change models can describe the same phenomena.
There have, correspondingly, been two competing positions on how to model the volatility of financial returns.One is in favor of long-memory, the other prefers regime changes.Dacorogna, Müller, Nagler, Olsen, and Picket (1993), Ding, Granger, and Engle (1993), Granger and Ding (1996), Baillie, Bollerslev, and Mikkelsen (1996), Bollerslev and Mikkelsen (1996), Andersen and Bollerslev (1997), Chambers (1998) and Bollerslev and Wright (2000), among others, advocate long memory models.The main conclusion of these studies is that the volatility of financial returns can be well described by long-range dependent models.Other researchers argued that the manifestations of long memory observed in the volatility of financial returns are due to unaccounted for structural breaks.Diebold (1986), Lamoreaux and Lastrapes (1990), Mikosch andStȃricȃ (Mikosch andStȃricȃ, 1999, Mikosch andStȃricȃ, 2002), Diebold and Inoue (2001), among others, contended that the long-range dependence in the conditional variance of returns is, in fact, a manifestation of changes in parameters or the unconditional variance.These ideas have recently been further developed by Stȃricȃ and Granger (2005).Krämer, Sibbertsen, and Kleiber (2002) observed that long memory in squares of German stock returns disappears once shifting means are properly accounted for.Econometric models where structural change can be modeled endogenously have been proposed by Cai (1994), Hamilton andSusmel (1994), Dueker (1997), to name just a few contributions.
An intermediate position of combining long memory and level shifts has also emerged.Bos, Franses, and Ooms (1999) found evidence of long-range dependence in the inflation rates of the G7 countries, and observed that the addition of a set of level shifts does decrease the intensity of this dependence for some countries.A similar conclusion was reached by Teyssiere and Abry (2006) who applied a wavelet based estimator, which is less sensitive to trends and level shifts.The wavelet estimator indicated a substantially lower intensity of long memory.Andreou and Ghysels (2002) examined estimating change-points in the presence of long memory.Using exchange rates data, Morana and Beltratti (2004) concluded that superior forecasts can be obtained at longer horizons by modeling both long memory and structural changes.
It is clear that both long memory and change point models have merits: a long memory model may provide a parsimonious description of a long, possibly non-stationary, time series, and may be useful for long-term forecasting, a change-point model may be more suitable for short term forecasts.In some financial applications, however, the choice between long memory and structural breaks is important, particularly in risk measurement, asset allocation and option pricing.To name just a few references, Bollerslev and Mikkelsen (1996) showed that taking into account a long memory structure of the volatilities can sometimes even double the price of options, compared with situations when long-memory is neglected.Garcia and Ghysels (1998) studied the effect of structural breaks on asset pricing, whereas Pastor and Stambaugh (2001) focused on their effect on equity premiums.
Formal statistical tests which would help decide whether a long-range dependent process or a weakly dependent process with change-points is a better fitting model for a particular time series are therefore of value.There has however not been much research in this direction.The periodogram based testing procedure of Künsch (1986) was developed to distinguish between a long-range dependent process and the process X k = Y k + f (k) with a monotonic function f and Gaussian weakly dependent Y k .Heyde and Dai (1996) constructed tests for detecting long-range dependence which are based on a smoothed periodogram are robust in the presence of small trends.Sibbertsen and Venetis (2004) recently further developed these ideas, while Jach and Kokoszka (2008) proposed a test based on wavelet domain likelihood.These tests are however not directly applicable to discriminating between long memory and volatility shifts in financial time series because they were designed for linear time series models.
In this paper we develop and compare several practical implementations of the test proposed by Berkes et al. (2006) and apply them to stock indexes and individual stocks.Denote by {r t } n 1 the returns on a speculative asset and by X t = r 2 t the squared returns.Under the null hypothesis the X t follow a change point model, under the alternative they are long range dependent.The testing procedure is based on a CUSUM statistic for the partial sums and involves estimating the long-run variances of subsamples obtained by dividing the observations into two parts, before and after a potential change-point.These variances are estimated, respectively, by estimators s 2 n1 and s 2 n2 which require the selection of a lag parameter q, defined in Section 2. The performance of several selection methods is evaluated via a simulation study.The best performance is obtained by a data driven pro-cedure which incorporates the pattern of volatility shift and uses different lags depending on whether the volatility increases or decreases over the observed sample.Applied to returns on indexes and individual stocks, our test indicates that even for the same asset, a change-point model may be preferable for a certain period of time, whereas there is evidence of long memory in another period of time.Generally there is stronger evidence for long memory in the eight years ending June 2006 than in the eight years starting January 1992.This pattern is most pronounced for US stock indexes and shares in the US financial sector.
The paper is organized as follows.We introduce the testing procedure in Section 2. Section 3 discusses several estimators s 2 n1 and s 2 n2 .Using the models obtained in Section 4 from daily returns on representative stocks and indexes, we study the finite sample performance of the tests by means of simulations in Section 5.In Section 6, the test is applied to a wide selection of indexes and to individual stocks grouped by sector.We summarize our results in Section 7.

The Test Procedure
Recall that X t = r 2 t , where {r t } n 1 are returns on a speculative assets which are assumed to have mean zero.
The X t follow a change-point model if (2.1) In (2.1), k * is the unknown time of a change in mean; the means µ and µ + ∆ are also unknown.The sequence {Y t } is assumed to have mean zero and to be stationary and weakly dependent.In our context, Y t = r 2 t − Er 2 t .We wish to test H 0 : The observations X t follow model (2.1).

versus
H A : The observations X t are long range dependent.
Observe that the unknown means µ and µ + ∆ in (2.1) are the variances of the returns.We thus test to discriminate between long-range dependence in variance and changes in variance.Berkes et al. (2006) discuss the extension of the above testing problem which allows multiple change-points under H 0 .However, visual inspection of the data stretches we study, see Figure 1, shows that it is reasonable to assume at most one change-point.
Examples of models satisfying H 0 and H A are given in Berkes et al. (2006).Roughly speaking, under H 0 , the autocovariances of the Y t = r 2 t − Er 2 t must decay exponentially, and under H A , the autocovariances of r 2 t must decay hyperbolically (and the process r 2 t must be stationary).
The testing procedure first estimates the change point k * using the CUSUM estimator Setting n 1 = k and n 2 = n − k, we then separate the realization into two sub-series, one before the other after the break, denoted, respectively, by {X t } n 1 1 and {X t } n n 1 +1 .We next define the statistics based on {X t } n n 1 +1 .In the definitions above, s 2 n 1 and s 2 n 2 are the estimates of the long-run variance of {X t } n 1 1 and {X t } n n 1 +1 , respectively.The idea of the test is that under H 0 , T n 1 and T n 2 have the same asymptotic distribution as if {X t } n 1 1 and {X t } n n 1 +1 did not contain a change-point.This is because the difference between k * and k is asymptotically negligible.Under H A , T n 1 and T n 2 diverge to infinity because they are computed from long memory sequences.
Under H 0 , the test statistic (2.5) has a known limiting distribution: where B (1) and B (2) are independent Brownian bridges, see Corollary 2.1 of Berkes et al. (2006).Table 1 gives asymptotic critical values c(α) defined by A main practical difficulty with the implementation of this testing procedure is the estimation of the long-run variances.Berkes et al. (2006) used in their small simulation study the Bartlett estimators Austrian Journal of Statistics, Vol. 36 (2007), No. 4, 253-275 where and the Bartlett kernel is Berkes et al. (2006) used the deterministic lag q(n) = 15 log 10 n which is 50% larger than the default S-PLUS maximum lag for the autocorrelation function, 10 log 10 n.This choice produces reasonable results, but we will see in this paper that different, data driven lag choices or even different estimators give much better results.The problem of variance estimation is taken up in detail in Section 3.

Variance Estimation
In this section, we describe four estimators s 2 n 1 (and s 2 n 2 ) appearing in (2.3) (and (2.4)).There are many variants of each estimation method listed below, but we found that the specifications we selected well reflect the characteristics of each method.We give each method a short name and an abbreviation.
1) Deterministic, q n .Set q n (n 1 ) = 10log 10 (n 1 ) and q n (n 2 ) = 10log 10 (n 2 ).These are the default values in many statistical packages.Replacing 10 by a different constant improves performance in some cases, but makes it worse in others.
2) Data driven, q * arma .Assume that under H 0 , GARCH(1,1) is the underlying model for the returns before and after the change-point.Specifically, assume that the volatility σ t evolves according to with nonnegative parameters α and β such that α + β < 1.The X t = r 2 t then follow an ARMA(1,1) model with the autoregressive coefficient ρ = α + β and the moving average coefficient ψ = −β, see Hamilton (1994, pp. 665-666).By minimizing the asymptotic truncated MSE, Andrews (1991) derived the optimal bandwidth q * for a class of real-valued kernel functions.The optimal truncation lag for the Bartlett kernel is given by q where a(X) is a function of the unknown spectral density f (λ).For ARMA(1,1) models with the autoregressive parameter ρ and the moving average parameter ψ, the estimate of a(X) is given by Eq. (6.6) in Andrews (1991), in which an integer p is involved.By setting p = 1, we obtain where ρ and ψ are appropriate estimates.To obtain them, we first calculate k (2.2), and split the return series into sub-series before the break {r t } n 1 1 and after the break {r t } n n 1 +1 , where n 1 = k.Focusing on {r t } n 1 1 , we compute the quasi-maximum likelihood estimates (QMLE) α1 and β1 of the GARCH(1,1) model.By setting ρ1 = α1 + β1 and ψ1 = − β1 and plugging them in (3.3), we get the estimate â1 (X).As the final step, we calculate the optimal bandwidth q * arma (n 1 ) for the squared series before the change point by substituting a(X) in (3.2) by â1 (X) and n by n 1 .The lag q * arma (n 2 ), n 2 = n − n 1 , is computed in the same way for the series, {X t } n n 1 +1 , after the change point.3) Modified data driven, q * c arma .As will be seen in the simulation study in Section 5, the empirical sizes of the test using q * arma are generally much lower than the nominal ones, indicating that the q * arma is too long to give satisfactory results.We therefore reduce the maximum lags by multiplying both q * arma (n 1 ) and q * arma (n 2 ) by a factor c less than 1.More importantly, our numerical experiments indicated that the constant c must depend on both n and k, and even on the pattern of the apparent volatility change.Specifically, denote by Var(X n 1 ) and Var(X n 2 ) the sample variances of the squared sub-series, {X t } n 1 1 and {X t } n 2 n 1 +1 , before and after the estimated change point k, respectively.The factor c is determined as follows.
If Var (X n 1 ) ≤ Var (X n 2 ) , c = log 10 n 100 1 The rationale will be explained in Section 5, where the effects of n and k * on empirical sizes are discussed in detail.In the sequel, we denote by q * c arma the product of c and q * arma .4) Prewhitening.This method does not depend on choosing a truncation lag, but uses the VARHAC (vector autoregression heteroskedasticity and autocorrelation consistent) procedure proposed by DenHaan and Levin (1996).We first fit an AR(b) model with autoregressive order b chosen by Akaike's Information Criterion (AIC).Denote by ρi , i = 1, 2, . . .b, the ith estimated autoregressive coefficient, and by σ 2 PW.resid the sample variance of the residuals.Obtaining these residuals is referred to as prewhitening.The estimate of the long-run variance is then We apply the procedure to the two squared sub-series, and s 2 n 2 , respectively.Originally proposed by Press and Tukey (1956), prewhitening has long been used in time series literature to reduce the bias of kernel-based spectral estimation, especially in the presence of strong temporal dependence.The AR model is not meant to be the true model of the underlying process, instead it is used as a tool to "soak up" some of the dependence.(cf.Andrews and Monahan, 1992, pp. 954).The VARHAC method differs in two aspects from the prewhitened kernel estimator in Andrews and Monahan (1992).First, they use AR model of fixed order, AR(1), in their simulations, instead of an order chosen by AIC.Second, as opposed to using the sample variance σ 2 PW.resid in (3.6) to construct the variance estimate s 2 , they follow a bandwidth selection procedure by fitting an AR(1) model to the prewhitened residuals and then use a kernel function to obtain the variance of the residuals.They do so to reduce the dependencies that might still exist in the prewhitened residuals.DenHaan and Levin (2000) highlight the pitfalls of using a fixed order of VAR prewhitening and find relatively little benefit from applying a kernel-based method to the prewhitened residuals.Our simulations confirm their findings.Indeed, for the squares of returns, the AR(b) prewhitened residuals are so close to white noise that fitting an AR(1) model yields the autoregressive coefficient not significantly different from zero.

Patterns of Volatility
In this section we study the daily returns on major US stocks and stock indexes with the objective to find what the typical model changes and the implied variance changes of real financial time series are.In the simulation study in Section 5 we will use these practically relevant models to compare the finite sample performance of the various variance estimators introduced in Section 3. The criteria for choosing the best estimator are not however the usual variance/bias criteria, but rather the closeness of the empirical size to the nominal size and effective power for the test considered in this paper applied to practically relevant models.
We consider two indexes: the Dow Jones Industrial Average index (DJIA) of the period from 1/1/1992 to 12/31/1999 and the National Association of Securities Dealers Automated Quotations Composite index (NASDAQ) covering the period from 7/1/1994 to 12/31/1998.In addition, we consider four constituent stocks of DJIA: the General Electric, the Wal-Mart Stores, Inc., the American Express of the same period from 1/1/2000 to 6/30/2004, and the Altria Group, Inc. of the period from 7/1/1997 to 12/31/1999.The coverage periods of the time series range from two and half years to eight years, with number of observations varying between 631 and 2021.We work with log returns r t = 100 log(P t /P t−1 ), where P t denotes the index value or stock price at time t.For each of the time series, we first estimate the break point of volatility using k in (2.2), and then separate the returns data with respect to the change-point and fit a GARCH(1,1) model to these sub-series so as to estimate the models before and after the break.
Figure 1 presents the time series plots of the six data sets, with the same limits on the y-axis.The locations of the estimated break points of volatility are marked by a dashed line.DJIA is a price-weighted average of thirty blue chip companies, and NASDAQ, comprised of more than 5000 domestic and foreign companies, is a weighted index based on market value.We thus expect, as clearly shown in the plots, that the volatilities of the indexes should be relatively smaller than those of individual stocks.It is also seen that, during the two and half years starting from 2001, the returns of the three stocks, GE, Wal-Mart, and American express, exhibit somewhat similar patterns.For instance, the estimated breaks of the three time series happen closely, approximately in the fourth quarter of 2002, and, moreover, the volatilities all decrease after the change-points.This may be attributed to the steady increase of stock prices since then.In contrast, during the time periods consider for DJIA, NASDAQ, and Altria Group, the returns are more volatile after the change-points.
The GARCH(1,1) model with mean µ was fitted to the sub-series of the six time series.Table 2 gives the results.The sample variance and implied variance of each stretch of data are also reported.Sample variances are computed in the usual way.Provided that We refer to this quantity as the implied variance (implied by the model).The closeness of the two variances reflects a reasonable fit of the models to the data.Relevant to our study, we distinguish two categories of models.Represented by DJIA, NASDAQ, and Altria Group, the volatilities of the first group increase after the breaks, while, the returns of the second group, represented by GE, Wal-Mart, and American Express, have more substantial swings before the breaks.Notice that in the first group the sum of α and β does not change much after the breaks, the values of ω do however increase.For the second group of models, α + β increases slightly after the change-points, but ω drops considerably.We thus conclude from the expression ω/(1 − α − β) that the increase (decrease) of the parameter ω is the main cause of the ups (downs) of the volatilities of the returns.As we observed in Figure 1, the volatility of indexes is much lower than that of stocks, Table 2 gives precise numerical values.The sample variance of DJIA increases from 0.424 to 1.465, and NASDAQ from 1.133 to 3.57, whereas, the smallest volatility change we observe for stocks is 1.85 for Wal-Mart and the largest is 8.482 for Altria Group.These six time series thus form a representative sample capturing the typical behavior of the return series.

Simulation Study
We compare the finite sample performance of the tests using the variance estimators introduced in Section 3. Since the primary concern is the type I error, the first goal is to find a test that outperforms the others in terms of empirical size.
We use the change-point models obtained from the returns on DJIA, NASDAQ, GE, Table 3: Empirical sizes (in percent) of the test using four different variance estimators applied to simulated series of squared GARCH(1,1) observations following the changepoint models estimated from the returns on DJIA, NASDAQ, GE, and Wal-Mart in Table 2.The number of replications is R = 5000, with nominal size α = 5%.

GE : Model 5 −−> Model 6
Figure 2: Comparison of empirical sizes for truncation lags q * arma and q * c arma .Results are reported for the change-point models for DJIA and GE.The straight line indicates the nominal size of 5%.and Wal-Mart and generate squared GARCH(1,1) observations of length n = 500, 1000, 1500, and 2000, with breaks at k * = n/4, n/2, and 3n/4.For each series, the test statistic M n is computed and using the critical values in Table 1, a decision on whether to reject H 0 at α = 1%, 5% or 10% is made.We repeat the process R = 5000 times.The empirical size is then the number of rejections divided by R. Once we find the procedure that gives best empirical sizes, we validate its performance by applying it to models for American Express and Altria Group.Its robustness to model mis-specification will also be investigated.
The empirical sizes of the test at the nominal significance level α = 5% are presented in Table 3, the results for 1% and 10% level are not presented to conserve space, but are available upon request.The properties of the 1% and 10% tests are broadly the same as those of the 5% test, on which we focus.The empirical sizes for the deterministic bandwidth q n are very unstable, varying from 0.42% to 39.36%.It appears that the richness of the dependence structure of squared GARCH(1,1) observations is far more than lags that are just proportional to the log of the sample size can capture.The lack of stability is also the main drawback of the method of prewhitening.It yields good sizes for some models, but suffers from over-rejections, especially for the DJIA based model.Generally speaking, the empirical sizes corresponding to the lags q * arma are always well under the nominal levels, only with some exceptions when the changes are from Model 3 to Model 4 for NASDAQ with k * = n/4 and from Model 7 to Model 8 for Wal-Mart with k * = 3n/4.From how the test statistic M n is constructed, one can see that under-rejections are caused by over-estimations of the long-run variance, that is, both s 2 n 1 and s 2 n 2 in (2.6) and (2.7), respectively, are too large, which in turn leads us to conclude that the lags q * arma are too long to yield good empirical sizes.We thus propose the lag q * c arma that is the product of q * arma and a factor c < 1.A constant value of c, for time series of varying lengths with different locations of change-point, does not work well however.We get good rejection rates when, for instance, the length is 1000 with k * = 500 using c = 0.5, but the empirical size for n = 2000 and k * = 1500 turn out to be two times of the target level.The reason lies in the fact that sample size and location of breaks jointly influence the empirical sizes.The factor c thus needs to take into account both n and k * .Two plots are presented in Figure 2 to illustrate the effects of n and k * on the empirical sizes, and also to show the improvements resulting from using lags q * c arma over q * arma .We now provide a rationale for the expressions for c in (3.4) and (3.5).It is clear that when lags are determined by q * arma sample size n and empirical rejection rate are positively associated, i.e., the more observations, the higher the rejection rate.To alleviate this effect, we thus need to use relatively wider bandwidth to taper the test statistic M n and get fewer rejections for longer series.The gradually increasing function log 10 (n/100) in (3.4) and (3.5) is used for that purpose.Now focus on the left panel of Figure 2 where the model change is from 1 to 2 for DJIA with volatility increased after a break.Notice that as k * increases toward the end of the series, the empirical sizes decrease, for every n.By contrast, in the right panel, where the volatility decreases after a break, we observe basically opposite situation.Thus, depending on whether the volatility increases or decreases after the apparent break, we should use different c.Specifically, for change-point models with volatilities increased after breaks, we use shorter lags to increase the value of M n and get relatively more rejections for greater k * .That is why, when Var(X n 1 ) ≤ Var(X n 2 ), the term (n 1 /n+1.5)−1 in the expression of c in (3.4) is used, whose value decreases for larger n 1 = k, the estimate of the change-point k * .Clearly, we should do the opposite for models with volatility decreased after breaks, i.e., when Var(X n 1 ) > Var(X n 2 ) we extend bandwidth for greater k * .In (3.5), the term (n 2 /n + 1.5) −1 is used instead in the determination of c, whose value increases, recalling that n 2 = n − n 1 , when the estimated break gets farther away from the beginning of the series.
The advantage of using lags q * c arma as opposed to q * arma is illustrated in Figure 2. The empirical sizes yielded by the modified bandwidth are very close to the nominal level of 5%, with positive and negative disparities less than 2%.Unfortunately, there are some serious over-rejections in Table 3.When the changes are from Model 3 to Model 4 for NASDAQ with k * = n/4 and from Model 7 to Model 8 for Wal-Mart with k * = 3n/4, we see rejection rates as high as 15.24% for nominal 5% level.Indeed, we expect over-rejections as severe as this by recalling that, even for longer lags q * arma with under-rejections as the dominating pattern, we get unusually high rejection frequencies for these two particular change-point models.Similar over-rejections are also observed for 1% and 10% tests.However, out of the four variance estimators considered in our simulations, we recommend the use of the Bartlett kernel estimator with truncation lags determined by q * c arma , even though it may occasionally lead to over-rejections when the change-point happens in the vicinity of either the start or the end of the series.This procedure yields good empirical sizes when the location of the break is close to the middle of the time series.
We next validate the procedure using lags q * c arma by applying it to the models for American Express and Altria Group in Table 2. Instead of considering different sample sizes with varying locations of the breaks, we let n and k * equal to the values obtained from the real time series.The empirical sizes at significance levels 1%, 5%, and 10% are presented in Table 4.The procedure works quite well; equally good performance is observed for other models derived from real data for which the breaks occur approximately halfway in the series.
Since the lag q * c arma is based on the estimation of a GARCH model, a usual criticism of such a parametric method is that a mis-specification of the model can lead to a large rejection bias.We address this issue by assessing the empirical sizes of the procedure using different models for data generating processes.Specifically, we consider two asymmetric GARCH(1,1) models, the Exponential GARCH (EGARCH) and the Threshold GARCH (TGARCH), that are widely used in practice.These two models were developed to incorporate the asymmetric news impact, also referred to as the leverage effect, on the volatility of financial time series which tends to rise in response to negative shocks and fall in response to positive shocks.
Table 6: Empirical sizes (in percent) of test using the Bartlett kernel estimator with lags q * c arma applied to simulated series of squared EGARCH(1,1) and TGARCH(1,1) observations following the change-point models estimated from the returns on DJIA, Altria Group, GE, and American Express in Table 5.The number of replications is R = 5000.We introduce the two models by noting that they differ from the standard GARCH(1,1) model only in the definition of the conditional variance σ 2 t in (3.1).For the EGARCH(1,1) model proposed in Nelson (1991), the conditional variance is defined as follows, where The TGARCH model is also known as the GJR-GARCH model because Glosten, Jagannathan, and Runkle (1993) proposed essentially the same model.The conditional variance for TGARCH(1,1) is given by (5.3) where I(r t−1 ) = 1 if r t−1 < 0, and I(r t−1 ) = 0 otherwise.We estimate EGARCH(1,1) model from the returns on DJIA and Altria Group and estimate TGARCH(1,1) model from the returns on GE and American Express.The four data sets, with their corresponding break points, are exactly the same as those used to estimate GARCH(1,1) in Table 2.The results are reported in Table 5.We notice, as expected, that the estimates of the leverage parameter γ are negative for the EGARCH models and are positive for the TGARCH models.We next apply the Bartlett kernel estimator with lags q * c arma to the simulated series of squared EGARCH(1,1) and TGARCH(1,1) observations following the models in Table 5, with n and k * equal to those obtained from the time series.The empirical sizes of the test based on R = 5000 replications are given in Table 6.The rejection frequencies are rather close to the nominal levels, the differences are less than 1.5% for α = 5% and less than 3% for α = 10%.The test is slightly too conservative when TGARCH is the DGP, but overall it is not very sensitive to model mis-specification.
To study the power of the test, we use the popular FIGARCH(p, d, q) process of Baillie et al. (1996) in which the conditional variance σ 2 t is assumed to satisfy the equation We consider order 1 polynomials φ(L) = The test is seen to have reasonably good power if d is not too close to zero.Table 7 indicates, however, that a rejection of H 0 should be viewed as strong evidence in favor of H A because the empirical rejection rate is never close to 100%.

Application to Indexes and Individual Stocks
Tables 8-11 on pages 272-275 show the values of the test statistic M n for a number of indexes and individual stocks.These observed values should be compared to the critical values in Table 1.For each asset we consider two stretches of data, approximately of length 2000.The first stretch corresponds to roughly 8 years beginning January 1, 1992, the second to roughly 8 years ending June 30, 2006.Practically all assets show an apparent increase in volatility around the middle of the first period, which persists until about the middle point of the second period.This corresponds well to the experimental setting analyzed in Table 3 and ensures that the empirical size is close to the nominal size.
We focus first on the results for stock indexes reported in Table 8.In most US indexes, we find strong evidence for long memory in the second period, but no, or very weak evidence, in the first period.For DJC and DJT, and foreign markets, there is practically no evidence of long memory, and a simple change point model is adequate.These findings are interesting and their explanation likely requires deeper economic insights which are beyond the scope of this paper.
For the financial sector companies, see Table 9, M n is generally larger in the second period, but the pattern of the rejection and acceptance of H 0 is less clear here.Stocks of the two insurance companies show moderate evidence of long memory, but so do those of Provident bank and UMB financial.
In the retail sector, see Table 10, stocks of upscale department stores, Dillard's and Gottschalks, show strong evidence of long memory.Stocks in the oil and manufacturing sectors, see Table 11, generally appear to follow the change-point model.
The above discussion is not intended to provide definite economic insights, but rather to illustrate the methodology and stimulate research into the reasons behind the observed results.It should be kept in mind that in a large collection of data sets, some rejections will occur due to chance; even if all series perfectly satisfy H 0 , a level α test will show about 100α% rejections.

Summary and Conclusions
We have developed a practical implementation of the test of Berkes et al. (2006) designed to distinguish between a change-point model and a long memory model.To the best of our knowledge, this is the first test of this type which can be applied to discriminate between changes in unconditional volatility and long memory in volatility of returns.Such a test reduces spurious rejections indicating long memory when in fact a change-point model is an approximate data generating mechanism.
Our implementation uses a kernel estimator of the long-run variance of squared returns with the maximal lag selected by a data driven procedure which depends on the sample size, the location of the estimated change point and the direction of the apparent volatility change (increase versus decrease).We also studied other long-run variance estimators, including the VARHAC estimator, but we found that they lead to tests with inferior performance.
When applied to returns on indexes and individual stocks, our test shows that long memory is less prevalent than indicated by previous studies, but may be present in some assets, at least over certain periods of time.Allowing at most one change-point under H 0 leads to the acceptance of the change-point model in most cases.

Figure 1 :
Figure 1: Daily returns on stocks and stock indexes.Dotted lines indicate the borderlines of the subsamples, i.e., the location of the estimated change-point.
1−φL and b(L) = 1−bL and two sets of values of the parameters a, b, φ, which are similar to those encountered in practice.Three values of the memory parameter d are used.All parameter values satisfy stationarity conditions derived by Baillie et al. (1996), i.e. a > 0, 0 d 1 − 2φ and 0 b φ + d, 0 < d < 1.

Table 1 :
Asymptotic critical values of the test statistic M n .

Table 4 :
Empirical sizes (in percent) of the test using the Bartlett kernel estimator with lags q * c arma applied to simulated series of squared GARCH(1, 1) observations following the change-point models estimated from the returns on American Express and Altria Group in Table2.The number of replications is R = 5000.

Table 7 :
Empirical power (in percent) of the test using lags q * c arma applied to simulated series of squared FIGARCH observations.

Table 9 :
Observed values of the statistics M n for shares in the financial sector.The asterisks *, ** and *** indicate, respectively, rejection at 10%, 5% and 1% levels.