On Independent Component Analysis with Stochastic Volatility Models

Consider a multivariate time series where each component series is assumed to be a linear mixture of latent mutually independent stationary time series. Classical independent component analysis (ICA) tools, such as fastICA, are often used to extract latent series, but they don’t utilize any information on temporal dependence. Also financial time series often have periods of low and high volatility. In such settings second order source separation methods, such as SOBI, fail. We review here some classical methods used for time series with stochastic volatility, and suggest modifications of them by proposing a family of vSOBI estimators. These estimators use different nonlinearity functions to capture nonlinear autocorrelation of the time series and extract the independent components. Simulation study shows that the proposed method outperforms the existing methods when latent components follow GARCH and SV models. This paper is an invited extended version of the paper presented at the CDAM 2016 conference.


Introduction
In this paper we assume that the observed p-variate time series x = (x t ) t=0,±1,±2,... follows the basic independent component (IC) model where µ is a p-variate location vector, Ω is a full-rank p × p mixing matrix and z = (z t ) t=0,±1,±2,... is an unobservable p-variate stationary time series such that (i) E(z t ) = 0, (ii) COV(z t ) = I p and (iii) the component series of z are independent.
Then x is also stationary with E(x t ) = µ and COV(x t ) = Σ = ΩΩ .In independent component analysis (ICA) the goal is to find, using the observed time series x 1 , . . ., x T , an estimate of an unmixing matrix W such that W x = (W x t ) t=0,±1,±2,... has independent component series.
The IC model has recently received a lot of attention in financial time series analysis as complicated p-variate time series models can then be replaced by p simple univariate (e.g.ARMA or GARCH) models in parameter estimation and prediction problems.The model also serves as a dimension reduction tool as often only few component series in z are relevant while the rest just present noise.For some recent contributions, see Broda and Paolella (2009); Chen, Härdle, and Spokoiny (2007); García-Ferrer, González-Prieto, and Peña (2012); Lu, Wu, and Lee (2009); Oja, Kiviluoto, and Malaroiu (2000).
In the literature standard ICA methods, such as fastICA, are often used to estimate an unmixing matrix W in a time series context although such methods only use the marginal distribution of x t and make no use of the information on temporal dependence.On the other hand, there exist second order source separation methods, like SOBI (Belouchrani, Abed Meraim, Cardoso, and Moulines 1997), which are particularly popular for analyzing biomedical data.Such methods use autocovariances and cross-autocovariances for the estimation.They are capable of separating time series with nonzero linear autocorrelations, but they do not utilize nonlinear autocorrelations.
Volatility clustering is a common feature in economic and financial time series, i.e. there are periods of lower and higher volatility.As the transitions between such periods do not typically have any clear pattern, they are treated as random occurrences.There are a vast amount of different models that have been invented for such situations.In our simulations we consider two popular choices, the GARCH model (Bollerslev 1986) and the SV model (Taylor 1982).For further information on stochastic volatility and a recent overview of stochastic volatility models, see for example Matteson and Ruppert (2011).
In this paper we review various independent component estimators that use nonlinear autocorrelations, and compare their performance to that of fastICA in a simulation study where independent time series components follow GARCH and SV models.The paper has the following structure.First, in Section 2 we define the aforementioned univariate stochastic volatility models.In Section 3 we discuss the ICA methods which are considered in this paper and suggest our extension.In Section 4 we show that this extension has the important affine equivariance property.Section 5 consists of the simulation study.

Stochastic volatility models for univariate series
There are several different stochastic volatility models.Here we concentrate on two widely used ones.The first one is GARCH (Generalized Autoregressive Conditional Heteroscedasticity) process (Bollerslev 1986) defined as follows.A univariate GARCH(p, q) process is given by where t is an independent white noise process and σ 2 t is a conditional variance process with ω > 0 and α i , β j ≥ 0 ∀i, j.For (second order) stationarity, we require that p i=1 α i + q j=1 β j < 1.Another popular model is the SV (Stochastic Volatility) model (Taylor 1982), defined as where t and η t are two independent white noise innovation processes.The parameter µ is the level, φ is the persistence and ση t is the volatility of the log-variance.The process h t is called the volatility process and it is strongly stationary with N (0, 1) innovations and initial state h 0 ∼ N (µ, σ 2 /(1 − φ 2 )).For stationarity, we require |φ| < 1 and µ ∈ R.

Source separation for multivariate time series
Under our model assumption, the standardized multivariate series of x t is given by x st t = Σ −1/2 (x t − µ).One of the key results in ICA states that there exists an orthogonal matrix U = (u 1 , . . ., u p ) such that z t = U x st t (up to signs and order of the components) (Miettinen, Taskinen, Nordhausen, and Oja 2015).Here z t denotes the vector of independent series.The final unmixing matrix functional is then given by W = U Σ −1/2 .The estimate of W is then obtained by replacing Σ and U by their sample counterparts.For finding U , we next list the criterion functions in different approaches.
In the symmetric fastICA (Hyvärinen and Oja 1997) and in the symmetric squared fastICA (Miettinen, Nordhausen, Oja, Taskinen, and Virta 2017a) U maximizes Here a twice continuously differentiable, nonlinear and nonquadratic function where y ∼ N (0, 1).Notice that both utilize only the stationary (marginal) distribution of x t .
The estimators presented below make use of the joint distributions of (x t , x t+k ), k = 1, 2, . . . .The classical SOBI uses only second moments and it was originally defined as a method which jointly diagonalizes several autocovariance matrices.However, SOBI can be reformulated so that U maximizes The solution is unique if, for all pairs i = j there exists a k, 1 ≤ k ≤ K, such that E(z t,i z t+k,i ) = E(z t,j z t+k,j ).SOBI fails to separate GARCH and SV time series as all lagged autocovariances are in such cases zero.
The gJADE procedure (Matilainen et al. 2015), in turn, uses a much richer sum of fourth cumulants and maximizes where Again, for K = 0, the regular ICA method JADE (Cardoso and Souloumiac 1993) is obtained.Both, gFOBI and gJADE, were created having stochastic volatility models in mind.
FastICA does not use any knowledge of temporal dependence, but there exists some fixedpoint algorithms aimed for a time series context.The FixNA (Fixed-point algorithm for maximizing the nonlinear autocorrelation) method was introduced in Shi, Jiang, and Zhou (2009), and its criterion function to be maximized is where G is a twice continuously differentiable function.The G-functions suggested in Shi et al. (2009) are G(z) = log(cosh(z)) and G(z) = z 2 .
A similar function to be maximized is of the form and we will denote it as FixNA2.It was first proposed in Hyvärinen (2001), however only with G(z) = z 2 and K = 1.We further similarly suggest a natural extension of SOBI with the criterion function .
As a variant of SOBI, we call this estimator vSOBI.
The term E G(u i x st t ) 2 in D 2 (U ) and D 3 (U ) is used to normalize the summands.Notice that in SOBI, G(z) = z, and hence the aforementioned term equals to 0. When G(z) = z 2 , the term equals to 1.
To obtain the estimating equations for matrix U , the Lagrangian multiplier technique can be used as in Miettinen, Illner, Nordhausen, Oja, Taskinen, and Theis (2016).The Lagrangian function to be optimized is and Then multiplying this by u j from the left side gives From here the estimating equations for an orthogonal U can be obtained.
Result 2. The estimating equations for an orthogonal matrix U are or, equivalently, U = (T r T r ) −1/2 T r .
For vSOBI for example, to maximize D 3 (U ), we need to calculate T 3 = (T 3,1 , . . ., T 3,p ) , where Therefore in all three cases r = 1, 2, 3 the computation of U which maximizes D r (U ) can be done iteratively given some initial orthogonal matrix U 0 and some tolerance limit ε as described in the algorithm below.

Affine equivariance
In blind source separation it is desirable that an unmixing matrix estimator is affine equivariant, which means that the separation performance does not depend on the actual value of the mixing matrix Ω.Let x * t = Ax t + b be an affine transformation of x t , where A is a non-singular p × p matrix and b is a p-vector.Then a method is called affine equivariant if W * = W A −1 and W x t = W * x * t , up to location shifts, sign changes and the order of the components.
Next write U * = U V = (u * 1 , . . ., u * p ) , where u * i = V u i , for i = 1, . . ., p. Now U * is an orthogonal matrix and Clearly also G(u * i x * st t+k ) = G(u i x st t+k ), where k = 1, . . ., K. Then D 3 (U * ) is the maximum of the criterion function for x * t , and Using the same arguments it can be shown that also FixNA and FixNA2 algorithms are affine equivariant.Notice also that the fastICA versions discussed here, SOBI, gFOBI and gJADE are all affine equivariant.
As a performance measure we use the Minimum Distance Index (Ilmonen, Nordhausen, Oja, and Ollila 2010), which is defined as where C is the set of all matrices with exactly one non-zero element in each row and column, and || • || is the Frobenius (matrix) norm.The index has the range 0 ≤ D ≤ 1, where zero indicates perfect separation.
Performance Figure 1 summarizes the results for both settings.Notice that for clarity we omitted the results for SOBI as it, as expected, failed to find any latent sources.From the four different fastICA versions only the best one is presented, and for FixNA only the pow version is shown as FixNA(lcosh) was much worse than any other FixNA method.FixNA(lcosh) also exhibited a strange convergence pattern as it was deteriorating when the sample sizes increased.It is seen in Figure 1 that the proposed vSOBI estimator works very well both in GARCH and in SV settings and, when using G(z) = log(cosh(z)) as the nonlinearity function, it clearly outperforms all the other estimators.Squared fastICA algorithms produce slightly better results than the original fastICA algorithms.In GARCH setting the symmetric squared fastICA with G(z) = z 4 − 3 produces better results that other fastICA algorithms, but compared to other algorithms its performance is still not good, as only gFOBI and SOBI have poorer performance.In the SV setting however, while regular fastICA and squared fastICA algorithms with G(z) = z 4 − 3 are not among the best, both versions with log(cosh(z)) type nonlinearities give good results.The squared version is the better one and, when compared to other algorithms, we can see that only vSOBI with G(z) = log(cosh(z)) is better.Notice that unlike the other estimators, fastICA algorithms do not utilize any information on temporal dependence.
In general it seems better to use FixNA2 instead of FixNA, perhaps due to FixNA2 having the more natural centering part in the objective function.The performance of gFOBI is poor in both settings, but gJADE seems to work decently in SV setting.
Convergence The convergence percentages of the proposed vSOBI algorithms are good, and in time series of length 800 onwards very close to 100 %.SOBI has no converge issues, but as stated before, it has very poor performance.Also gFOBI and gJADE have very few convergence issues, and only when the time series is very short.
On the other hand, FixNA2 algorithms have lots of convergence problems in short time series, as they converged in less than 25 % of repetitions in time series of length 100.As the time series length increases, the convergence rate approaches 100 %.FastICA algorithms have more issues in the GARCH setting (less than 50 % of repetitions converging in the beginning) than in the SV setting.
FixNA(pow) has only some convergence issues in short time series.On the other hand, convergence of FixNA algorithm when using G(z) = log(cosh(z)) as the nonlinearity function is surprising in SV setting, as there are more convergence issues when time series length increases, contrary to what is expected (not shown in figures).For time series length 100 its convergence percentage is 98.5, while with time series length 25600 it is only 76.2.The reason Converge results are summarized in Figure 2. Except for FixNA(lcosh) in the SV setting, convergence percentages of all algorithms are larger than 99.4 % in time series of length 25600.

Discussion
In this paper we surveyed different blind source separation methods suitable for multivariate time series with stochastic volatility features.Such methods were earlier quite scattered in the literature.We suggested a small modification to existing methods yielding the family of vSOBI estimators.The simulations were used to compare the vSOBI estimators with previously proposed methods using stochastic volatility models.The proposed vSOBI estimator with G(z) = log(cosh(z)) as the nonlinearity function showed the best performance among its competitors.
SOBI as a second order method was designed to function on time series with nonzero linear autocovariances, such as ARMA processes.In stochastic volatility models linear autocovariances are zero, and therefore SOBI is useless.On the other hand, the methods which exploit nonlinear autocorrelations, considered in this paper, are far from optimal in separating ARMA processes.In practical situations one can easily imagine a time series independent component model where some components are ARMA processes and others exhibit stochastic volatility features.Therefore it is desirable to derive methods which work in such cases.
In our future research we will consider a weighted combination of SOBI and vSOBI yielding for example the objective function .
with some suitable weight a ∈ [0, 1] and function G.This could still be generalized to case where SOBI and vSOBI parts use different combinations of lags.

Figure 1 :
Figure 1: Comparison of performance of algorithms in the GARCH setting (left panel) and in the SV setting (right panel).

Figure 2 :
Figure 2: Comparison of convergence percentages of algorithms in the GARCH setting (left panel) and in the SV setting (right panel)