Estimation of Time Series Models via Robust Wavelet Variance

A robust approach to the estimation of time series models is proposed. Taking from a new estimation method called the Generalized Method of Wavelet Moments (GMWM) which is an indirect method based on the Wavelet Variance (WV), we replace the classical estimator of the WV with a recently proposed robust M-estimator to obtain a robust version of the GMWM. The simulation results show that the proposed approach can be considered as a valid robust approach to the estimation of time series and state-space models.


Introduction
The robust estimation of time series parameters is still a widely open topic in statistics for various reasons.First of all, the robustness theory for dependent data is still not fully developed given that the classical robustness measures are not directly applicable in the time series context.In fact, for example, there is no unique definition of an influence function (Hampel 1974) for time series since there is no unique definition of outliers or, more specifically, there are different types of outliers which require to adapt such a measure (see Maronna, Martin, and Yohai 2006, for a detailed overview).Secondly, many of the existing methods for robust estimation of time series' parameters are limited in terms of the range of models that can be estimated and, above all, in terms of computation complexity as the models get larger or more complicated.Moreover, robust estimation of latent time series models (models made of the sum of several unobserved processes) has been largely ignored.
For robust estimation and inference for time series, a detailed list of references can be found in Maronna et al. (2006), Chapter 8. Most of the literature in this domain has dealt with standard time series models such as autoregressive and/or moving average processes, starting with the seminal work of Masreliez and Martin (1977), Denby and Martin (1979), Bustos and Yohai (1986) and Künsch (1984).Estimating robustly the parameters of latent models has not gone beyond the AR(1) plus white noise (Masreliez and Martin 1977), probably because of the difficulty in implementation of the different algorithms.
This paper intends to explore the possibilities opened up by combining two recently proposed approaches: the first concerning the robust estimation of the Wavelet Variance (WV) proposed by Mondal and Percival (2012) and the second proposed by Guerrier, Stebler, Skaloud, and Victoria-Feser (2013b) presenting a new method for the estimation of complex time series parameters based on the WV, called the generalized method of wavelet moments (GMWM).Since GMWM estimators are based on the matching between empirical and model based WV estimators, the use of a robust estimator for the WV will in principle ensure robustness of the model's parameters estimator, as done, for example, with the robust generalized method of moments (see Hansen 1982;Ronchetti and Trojani 2001).
The paper is organized as follows.In Section 2 we present a robust WV estimator proposed by Mondal and Percival (2012) that we modify to improve its robustness properties.In Section 3 we briefly present the GMWM and propose a robust version of this method, and in Section 4 we present a simulation study that involves several models, including latent time series models.

Robust Estimation of the Wavelet Variance
The WV is a quantity which is widely used throughout many scientific and engineering disciplines as a means to decompose, describe and summarize time series.For example, it has been used for over 30 years as a standard routine measure of frequency stability in lasers (see Fukuda, Tachikawa, and Kinoshita (2003)) or atomic clocks (see Allan (1987)).More recently, the WV has also been used with optical sensors (see Kebabian, Herndon, and Freedman (2005)), various types of gas monitoring spectrometers (see Bowling, Sargent, Tanner, and Ehleringer (2003); Werle, Mücke, and Slemr (1993)), sonic anemometer-thermometers (see Loescher, Ocheltree, Tanner, Swiatek, Dano, Wong, Zimmerman, Campbell, Stock, Jacobsen et al. (2005)), inertial sensors (see Guerrier (2009); El-Sheimy, Hou, and Niu ( 2008)), radio-astronomical instrumentation (see Schieder and Kramer (2001)).The WV was also used for example in Percival and Guttorp (1994) to analyse geophysics time series.This approach was also used for physiological signal analysis for example in Fadel, Orer, Barman, Vongpatanasin, Victor, and Gebber (2004) or in Gebber, Orer, and Barman (2006).In Whitcher (2004), discrete wavelet packet transforms are used to estimate one of the parameters of a seasonal long memory process for the analysis of atmospheric and economic time series.
The WV can be interpreted as the variance of a process after it has been subject to an approximate bandpass filter (Percival and Guttorp 1994).Let {X t }, t ∈ Z, be a stationary process, or a non-stationary process with stationary backward differences of order d.By applying a specific wavelet filter { hj,l }, j = 1, . . ., J to this process we obtain the Maximum Overlap Discrete Wavelet Transform (MODWT) coefficients {W j,t } (see e.g.Percival and Walden 2000) as follows where j is the scale at which the filter is applied and L j = (2 j − 1)(L 1 − 1) + 1 is the length of that filter with L 1 being the length of { h1,l }.Given the wavelet coefficients, the WV at scale j is defined as the variance of the wavelet coefficients at this scale Under the stationarity conditions defined above, it can be observed that the WV ν j is not a function of t (i.e. is time-invariant).This entails a series of properties, among which the following where σ 2 X is the variance of {X t }.Hence, the WV is a decomposition of the process variance and, as highlighted earlier, is consequently useful under many aspects if one is concerned by how the variance of a process is distributed across the different scales.
The MODWT estimator of the WV was defined in Percival (1995) and is given by νj with T j being the set of time indices for which the wavelet coefficients are free of end effects, and M (T j ) = T − L j + 1 being their number.This estimator of the WV is the most efficient asymptotically and it's properties were studied and proved in Serroukh, Walden, and Percival (2000).
An alternative estimator for the WV is based on the Discrete Wavelet Transform (DWT) coefficients (see Greenhall 1991;Percival and Guttorp 1994), for which the wavelet filter is applied to the process in (1) in a different manner.More specifically, the DWT filters a sequence {X t } on non-overlapping windows yielding the DWT wavelet coefficients where t is taken at intervals of lag L j .
However, in a recent article Mondal and Percival (2012) underline how even "a moderate amount of contamination often has a very adverse effect on conventional estimates of the wavelet variance".For this purpose they propose an M-estimator for the WV based on the transformation of the WV (a scale parameter) to a location parameter as follows They then propose to use the following M-estimator μj = argmin which is then inversely transformed and corrected for bias in order to obtain a consistent estimator for ν j .Here ψ(•) is a function of bounded variation which guarantees the robustness of the estimator.Mondal and Percival (2012) suggest four types of ψ-functions and make use of the median-type function for their simulations, that is to say ψ(z) =sign(z).This ψfunction is therefore the one which will be used in the Monte Carlo simulations presented further on in this paper.
Moreover, preliminary simulations have shown that in many cases the WV based on the DWT coefficients appear to be more appropriate for robustness purposes than the MODWT coefficients.Hence the Monte Carlo study will be done using the WV based on the DWT coefficients W j,t by using the relationship between these and the MODWT coefficients as underlined in Percival (1995).

Robust Generalized Method of Wavelet Moments
Guerrier et al. (2013b) propose a method for the estimation of complex time series models, namely the GMWM.The method extends from the GMM setting and uses the implicit link which exists between the WV and the underlying assumed model P θ .The link is the following where S W j (f ) is the Power Spectral Density (PSD) function for the wavelet coefficients W j or W j,t , H j (f ) = L 1 −1 l=0 hj,l e −i2πf l denotes the transfer function of the wavelet filters hj,l (or h j,l ), with | • | being the modulus, and S P θ is the PSD implied by P θ .Hence there is a link between the WV and P θ .
Let us define ν = [ν 1 , . . ., ν J ] as the vector of WV and ν(θ) as the WV vector implied by the process P θ .Taking advantage of the above link, Guerrier et al. (2013b) propose the following estimator θ = argmin where Ω is an appropriate positive definite weighting matrix.The authors provide the proofs of consistency of the estimator for a number of time series models as well as of its asymptotic normality.
The idea behind this paper is to combine the estimation method presented in Section 2 with the GMWM.Hence, instead of using the classical estimator of the WV defined in (4), we propose to use the transformed and corrected version of the estimator in (7) using the DWT.
We then use this estimator for ν in ( 9) to obtain a robust estimation method.
This proposed approach has its theoretical bases in the papers by Ronchetti and Trojani (2001) and Genton and Ronchetti (2003).Using a robust estimator of ν implies a robust estimator for θ with a bounded influence function since a bounded estimator for ν bounds the function ( ν − ν(θ)).
The next section presents a Monte Carlo study to investigate the performance of this new approach on different stochastic processes.

Monte Carlo Study
In this section we present a Monte Carlo study of the estimator proposed in Section 3. We will investigate the performance of the estimator on three processes, namely a white noise process (WN), a first-order autoregressive process (AR1) and a composite stochastic process like the simulation presented in Guerrier et al. (2013b).
In addition to the wavelet moments used in the latter article, Guerrier, Stebler, Skaloud, and Victoria-Feser (2013a) suggest using additional moments of the processes to improve the performance of the GMWM estimator.Hence, the simulations will use the second moment in the case of the WN and AR1 processes and the first and second moments of the firstorder difference of the composite process since the latter is stationary and has a non-zero expectation.
We will compare three estimators: the Maximum Likelihood Estimator (MLE), the classical GMWM estimator (GMWM) and the robust estimator proposed in the present paper (RGMWM).For the classic GMWM, the first and second moments will be estimated respectively via the classical estimators of mean and variance whereas the third estimator will use respectively the median and the M-estimator proposed in (7).For the classical and robust WV estimator, the DWT wavelet transform is used.In all studies, processes of length L = 1000 were simulated and the contaminations were additive (i.e.Gaussian noise with a specific variance σ 2 was added to an -percentage of the observations of the underlying process).

White Noise
A white noise process can be written as X t iid ∼ N (0, σ 2 ).
Hence, the only parameter to be estimated is σ 2 .The performance of the proposed estimators when there is no contamination is illustrated in Figure 1 where all estimators appear to be unbiased.The RGMWM however has a larger variance which is to be expected since efficiency is the price to pay for robustness.2, the advantage of using the RGMWM is evident.

First-Order Autoregressive
A first-order autoregressive process can be represented as follows where φ is the autoregressive parameter and t iid ∼ N (0, σ 2 ). Figure 3 shows how the proposed RGMWM estimator appears to confirm its robustness properties under a 1%-contaminated process with additive noise with σ 2 = 9.Its improved performance compared to the classical estimators is highlighted by the results in Table 3.The latter table appears to indicate that this robust approach is particularly convenient for estimating the σ 2 of the innovation process compared to the autoregressive parameter φ.

Latent Time Series Model
The GMWM methodology was mainly developed to estimate models made up by latent processes.An example of such a process was given in Guerrier et al. (2013b) as a sum Figure 2: Finite sample performance of the MLE, GMWM and RGMWM estimators on a 5%-contaminated white noise process of length L = 1, 000, with σ = 1 and contamination generated by adding Gaussian noise with σ 2 = 100.MLE represents the maximum likelihood estimator, GMWM represents the classic GMWM estimator with additional second moment of the process, RGMWM represents the robust GMWM based on the M-estimator proposed by Mondal and Percival (2012) with DWT wavelet transforms.
Table 2: Finite sample bias, variance and MSE of the MLE, GMWM and RGMWM estimators on a 5%-contaminated white noise process of length L = 1, 000, with σ = 1 and contamination generated adding Gaussian noise with σ 2 = 100.MLE represents the maximum likelihood estimator, GMWM represents the classic GMWM estimator with additional second moment of the process, RGMWM represents the robust GMWM based on the M-estimator proposed by Mondal and Percival (2012) with DWT wavelet transforms.MLE GMWM RGMWM Bias 4.988 • 10 −1 5.018 Figure 3: Finite sample performance of the MLE, GMWM and RGMWM estimators on a 1%-contaminated first-order autoregressive process of length L = 1, 000 with σ = 1, φ = 0.9 and contamination generated by adding Gaussian noise with σ 2 = 9.MLE represents the maximum likelihood estimator, GMWM represents the classic GMWM estimator with additional second moment of the process, RGMWM represents the robust GMWM based on the M-estimator proposed by Mondal and Percival (2012) with DWT wavelet transforms.
Table 3: Finite sample bias, variance and MSE of the MLE, GMWM and RGMWM estimators on a 1%-contaminated first-order autoregressive process of length L = 1, 000 with σ = 1, φ = 0.9 and contamination generated by adding Gaussian noise with σ 2 = 9.MLE represents the maximum likelihood estimator, GMWM represents the classic GMWM estimator with additional second moment of the process, RGMWM represents the robust GMWM based on the M-estimator proposed by Mondal and Percival (2012)  of an autoregressive process, a drift process {ω} and a white noise process as follows For these kind of processes, the GMWM (along with the robust version presented in this paper) presents important advantages over alternative approaches (see Guerrier et al. (2013b)).
When contaminating this process (with φ = 0.95, ω = 0.04, σ 2 AR = 16, σ 2 W N = 4) with 5%additive outliers with σ 2 = 9, the results seem to indicate that the classic GMWM does not appear to be greatly affected by this contamination for the first three parameters.However, it shows all the impact of the outliers for the estimation of the white noise innovation parameter σ 2 W N where the RGMWM shows only a very slight bias.The results presented in Table 4 show how the GMWM seems to perform better than the proposed RGMWM except for the parameter σ 2 W N where the GMWM is completely biased.Therefore, although slightly biased for most of the parameters, the RGMWM limits this bias for all parameters whereas the classical GMWM loses this property for one parameter.The improved performance of the RGMWM on the innovation parameters could be explained by the fact that the latter process is especially identifiable at the first scales of the WV which are also the most informative (i.e. they have a larger number of wavelet coefficients).

Conclusions and Perspectives
Given the theoretical bases and the results of the Monte Carlo studies, the proposed estimator appears to be an extremely valid candidate for the robust estimation of time series models.Knowing the theoretical WV ν(θ) of a process, it is possible to estimate the parameters θ of this process in a robust manner.
The theoretical WV of many processes can be derived from the results in Zhang (2008) or, as an alternative, Guerrier et al. (2013b) suggest to use indirect inference to overcome the complexity of calculations for certain models.Hence, the proposed estimator is easily implemented and computationally inexpensive while at the same time providing a robust estimation method for many processes for whom robust estimation methods are scarce.
There are many possible developments for this method, including the study of its asymptotic properties.Given the variety of wavelet decompositions, different wavelets and filtering methods could be explored to understand if some of them could contribute more effectively to the robust estimation approach presented in this paper.Moreover, as highlighted earlier, Guerrier et al. (2013a) suggested additional adjustments to the GMWM methodology to improve its performance and its robust equivalents could be considered to improve the performance also of the approach proposed in this paper.
Table 4: Finite sample bias, variance and MSE of the GMWM and RGMWM estimators on a 5%-contaminated composite process (10) of length L = 1, 000, with φ = 0.95, ω = 0.04, σ 2 AR = 16, σ 2 W N = 4 and contamination generated by adding Gaussian noise with σ 2 = 9.GMWM represents the classic GMWM estimator with additional first and second moment of the first-order difference of the process, RGMWM represents the robust GMWM based on the M-estimator proposed by Mondal and Percival (2012)

Figure 1 :
Figure1: Finite sample performance of the MLE, GMWM and RGMWM estimators on an uncontaminated white noise process of length L = 1, 000, with σ = 1.MLE represents the maximum likelihood estimator, GMWM represents the classic GMWM estimator with additional second moment of the process, RGMWM represents the robust GMWM based on the M-estimator proposed byMondal and Percival (2012) with DWT wavelet transforms.