Spatial Blind Source Separation in the Presence of a Drift

Multivariate measurements taken at different spatial locations occur frequently in practice. Proper analysis of such data needs to consider not only dependencies on-sight but also dependencies in and in-between variables as a function of spatial separation. Spatial Blind Source Separation (SBSS) is a recently developed unsupervised statistical tool that deals with such data by assuming that the observable data is formed by a linear latent variable model. In SBSS the latent variable is assumed to be constituted by weakly stationary random fields which are uncorrelated. Such a model is appealing as further analysis can be carried out on the marginal distributions of the latent variables, interpretations are straightforward as the model is assumed to be linear, and not all components of the latent field might be of interest which acts as a form of dimension reduction. The weakly stationarity assumption of SBSS implies that the mean of the data is constant for all sample locations, which might be too restricting in practical applications. Therefore, an adaptation of SBSS that uses scatter matrices based on differences was recently suggested in the literature. In our contribution we formalize these ideas, suggest an adapted SBSS method and show its usefulness on synthetic and real data.


Introduction
Data that are collected at different spatial locations occur quite often in statistical modeling.Pollution measurements in cities, element concentration in mines or socioeconomic variables across countries are examples of measurements which show dependencies as a function of spatial separation.With advanced technology data collection is significantly improved which leads to higher numbers of sample locations as well as measured variables.Therefore, proper statistical tools for such multivariate spatial data need to consider spatial dependencies in and in-between variables of interest.
In a first exploratory step, multivariate spatial data might be analyzed by using the popular Principal Component Analysis (PCA), where variance maximizing orthogonal transformations of the data are found.The disadvantage of this method in the present context is obvious.Only variance is maximized which ignores the spatial second order dependencies completely, orthogonal transformations might be too restricting, and the result is dependent on the scale of the data.Nordhausen, Oja, Filzmoser, and Reimann (2015); Bachoc, Genton, Nordhausen, Ruiz-Gazen, and Virta (2020) introduced Spatial Blind Source Separation (SBSS) as an adaptation of Blind Source Separation (BSS) for spatial data, which overcomes the significant drawbacks of PCA.BSS is a framework which originates from signal processing and assumes that the multivariate data at hand are formed by linear transformations (not necessary orthogonal) of unobserved variables.The aim of BSS is to estimate these latent variables which might show a clearer structure and reveal the driving processes of the data.BSS is well-established for many types of data, such as independent and identically distributed (iid) data, where it is denoted as Independent Component Analysis (ICA) (Nordhausen and Oja 2018a), time series data (Pan, Matilainen, Taskinen, and Nordhausen 2022) or tensorial data (Virta, Li, Nordhausen, and Oja 2017).General overviews are provided by Cichocki and Amari (2002) and Comon and Jutten (2010).SBSS, in detail described in Section 2, assumes that the observed multivariate data are formed by (spatially) uncorrelated, weakly stationary latent random fields.The motivation behind this assumptions is clear.Firstly, as the entries of the latent random field are uncorrelated, univariate analysis can be carried out individually which discards demanding multivariate approaches (Muehlmann, Nordhausen, and Yi 2021b;Muehlmann, Bachoc, Nordhausen, and Yi 2022b).Secondly, the latent random field is found by maximizing second order spatial dependence but interpretations are still straightforward as the identified transformations are linear.Lastly, only certain components might be of interest for further analysis or the domain expert which acts as a way of dimension reduction.SBSS finds the latent random field by jointly diagonalizing the covariance and so-called local covariance matrices that capture second-order spatial dependence.
The original assumption of SBSS that the entries of the latent field are second order stationary might be too restricting in practical considerations.In many situations it is more natural to consider a drift in the data which violates the non-constant mean assumption.Muehlmann, Filzmoser, and Nordhausen (2020) considered that case by replacing local covariance matrices with local difference matrices, which avoids the estimation of the mean and is practically more robust in the presence of a drift.However, this publication did not consider a theoretical framework for its methods.Moreover, SBSS methods usually rely on a whitening step.If the regular covariance matrix is used for whitening, then a non-constant mean is again a problem as the covariance also relies on the estimation of the mean.This is addressed in the subsequent chapters.The main contribution of the present paper is three-fold: (i) In Section 2 we introduce SBSS in an intuitive and self-contained way to more applied scientist in the field of spatial data.(ii) Section 3 is devoted to the use of local difference matrices in SBSS, it puts the ideas of Muehlmann et al. (2020) on a solid basis and introduces a novel SBSS method with an accordingly adapted whitening step based on differences.(iii) We validate the performance of the newly introduced SBSS method in a thorough simulation study in Section 4 and show how this method can be used in a real-world data application in Section 5. Lastly, Section 6 summarizes the main theoretical and experimental results, Section 7 discusses the impact of the newly introduced methods and gives outlook on further research, and Section 8 draws the main conclusions of the present paper.

Spatial blind source separation
BSS for spatial data is a relatively new field in multivariate geostatistics.It was first introduced by Nordhausen et al. (2015) where the method was motivated by a geochemical application.Bachoc et al. (2020) put SBSS on a sound theoretical basis, refined SBSS, and derived asymptotic properties for the estimators.In the following three parts we motivate the statistical SBSS model, present two methods to estimate the involved parameters and discuss the connections to PCA.

Stationary SBSS model
The main motivation for SBSS is the assumption that the multivariate observed random field is formed by an affine transformation of an unobserved latent random field.This is useful as the latent random field is meant to represent physical processes that led to the observed random field.In more mathematical terms this can be formulated by (1) Equation ( 1) consists of two random and two deterministic components.The former are the observed p-variate random field x(s) = (x 1 (s), . . ., x p (s)) ⊤ and the unobserved latent random field z(s) = (z 1 (s), . . ., z p (s)) ⊤ , and the latter are the p × p invertible mixing matrix A and the p-dimensional location vector m.This model is in the literature referred to as the location-scatter model.A detailed discussion on this model and BSS is given by Nordhausen and Oja (2018a,b).The goal now is to reverse this transformation and to recover the latent random field z(s) only given the observed one x(s).Up to now this is impossible as too many parameters of the model are unknown, thus, further assumptions on the latent random field are needed to make the model (more) identifiable.These assumptions come from the physical motivation of the model: as the entries of z(s) are meant to represent physical processes they are assumed to be independent as physical processes mainly act independently of each other.Moreover, more technical assumptions are needed, namely that the mean of z(s) is zero, the covariance matrix is the identity matrix, and the entries are weakly stationary.A thorough discussion on why these assumptions are needed is given by Nordhausen et al. (2015); Bachoc et al. (2020).The former considerations can now be stated as a statistical model as follows.
Definition 1 (SBSS model).A p-variate random field x(s) defined on a d-dimensional spatial domain S ⊆ R d follows a spatial blind source separation model if it can be written as for all s ∈ S, where A is the full-rank p × p deterministic mixing matrix, m is the constant, p-dimensional, deterministic drift vector and z(s) is the p-variate latent random field which fulfills the following assumptions.
(SBSS 1): E (z(s)) = 0 for all s ∈ S, (SBSS 2): Cov(z(s)) = E z(s)z(s) ⊤ = I p for all s ∈ S and , where D is a diagonal matrix containing the stationary covariance functions of each entry of z(s) as diagonal elements.
The SBSS model is a semi-parametric linear latent variable model as z(s) is unobserved and only assumptions on the first two moments are stated.Furthermore, z(s) is only defined up to sign and order as PSz(s) still fulfills all assumptions stated above, here P is a permutation matrix and S is a diagonal matrix where each diagonal element is either +1 or −1.This holds true for any BSS model generally, and usually this is considered as minor important as the sign and order might be determined by the context of the analysis or is not of interest at all.Note that the scale is not identifiable for BSS in general, but in the model stated above it is fixed by assumption (SBSS 2) to be unity; this assumption will be given up in a subsequent section.
As stated above, the goal of SBSS is to recover the latent field z(s) based on the model above, which is usually done by finding the so-called unmixing matrix W and the drift m which recovers the latent field by z(s) = W(x(s) − m) up to sign and permutations only based on one given realization of x(s) on n sample locations s 1 , . . ., s n .There exist two methods in the literature to solve this problem, but before going into detail we explain the general strategy how these two methods are constructed.Both follow the following three steps: 1. Find the covariance matrix Cov(x(s)) and the drift m of the observed random field x(s) and define the whitened version of x(s) by Cov −1/2 (x(s))(x(s) − m).
2. Find an orthogonal transformation Ũ based on the whitened version of x(s).
The following estimators differ only in step 2. in the way how the orthogonal transformation is found.The reason why such three-step procedure is adequate can be seen by considering the singular value decomposition of the mixing matrix A = UDV ⊤ which determines the covariance matrix of x(s) to be Cov(x(s)) = UD 2 U ⊤ .As the covariance matrix is positive definite by assumption one can whiten x(s) by Cov −1/2 (x(s))(x(s) − m).Now we can plug in Equation ( 1) and arrive at Cov −1/2 (x(s))Az(s) = UV ⊤ z(s).Thus, the whitened version of x(s) is only an orthogonal matrix Ũ⊤ = UV ⊤ away from the latent field z(s).More details and a proper mathematical derivation is given by Miettinen, Taskinen, Nordhausen, and Oja (2015).Now we can detail on how different SBSS methods find the orthogonal transformation.

Two SBSS estimators
The three-step outline above on how to construct SBSS estimators can also be interpreted in a different view.Namely, one carries out steps which transform x(s) in such a way that the assumptions of Model 1 are met.
Step 1. takes care of (SBSS 1) and (SBSS 2) as the whitened version of x(s) has zero mean and identity covariance.The aim of step 2. is to find an orthogonal matrix that makes the spatial covariance matrix diagonal for any spatial lag, which results that also (SBSS 3) is met.To do so, the spatial covariance needs to be estimated.Therefore, Nordhausen et al. (2015) and Bachoc et al. (2020) introduced local covariance matrices (LCov) as a key tool for SBSS, which are defined as LCov matrices are weighted averages of the covariance matrices between all possible pairs of n given sample locations, where the weights are determined by the spatial kernel function f : R d → R. Nordhausen et al. (2015) suggested ball kernel functions f b (h) = I(∥h∥ ≤ r) which only consider locations that are separated by a maximum distance of r.Bachoc et al. (2020) additionally considered ring and Gaussian kernel functions.The ring kernel function is written as f r (h) = I(r in < ∥h∥ ≤ r out ), which considers locations that are separated by a minimum of r in and a maximum of r out .A smooth version of the ball kernel function is the Gaussian kernel function f g (h) = exp(−0.5(Φ−1 (0.95)∥h∥/r) 2 ) where Φ −1 (0.95) is the 95% quantile of a standard Normal distribution.Generally, the kernel function can be of different shapes where for examples anisotropies can be modeled by accounting also for the direction of h.LCov matrices are symmetric if the kernel function is symmetric, i.e. f (h) = f (−h).Note that when f (h) = I(∥h∥ = 0), LCov matrices reduce to the usual covariance matrix, in the following denoted as LCov 0 .
Considering the SBSS model above, the LCov matrices evaluated on z(s) yield a diagonal matrix which motivates the following method described by Bachoc et al. (2020).
Definition 2. For a random field x(s) following the SBSS model in Definition 1 the unmixing matrix functional W simultaneously diagonalizes the covariance matrix and one local covariance matrix for a given kernel function f such that where D f is a diagonal matrix with decreasingly ordered diagonal elements.
For the above method the exact diagonalizer W can be found by solving the generalized eigenproblem, or equivalently in a two step fashion by firstly whitening the data as x wt (s) = LCov −1/2 0 (x(s))(x(s) − m) and then performing an eigendecomposition of LCov f (x wt (s)).Note that the ordering of the diagonal elements of D f fixes the order of the components of the latent field.As only one specific LCov matrix is utilized this method is very sensitive to its specific spatial kernel function choice.One workaround is suggested by Bachoc et al. (2020) leading to the following definition.
Definition 3. Consider a random field x(s) following the SBSS model in Definition 1.The whitened version of x(s) is defined by x wt (s) = LCov −1/2 0 (x(s))(x(s) − m).For a given set of kernel functions F = {f 1 , . . ., f K }, U is the p × p orthogonal joint diagonalization matrix which maximizes Then, the unmixing matrix functional is given by W In Definition 3, diag(M) denotes the diagonal matrix where the diagonal elements are the ones of the matrix M, and ∥•∥ F denotes the Frobenius matrix norm.For a given realization of x(s) on n sample locations the sample versions of the K LCov f k (x wt (s)) matrices usually do not commute due to the estimation error (some square matrices of same dimension A and B commute if AB = BA).In order to find a matrix U which exactly jointly diagonalizes all the K LCov f k (x wt (s)) matrices it is required that these matrices commute.To solve this issue, algorithms that approximately jointly diagonalize these matrices are usually used in BSS.We use the algorithm described by Cardoso and Souloumiac (1996) which is based on iterative applications of Givens rotations.In contrast to Definition 2 the method of Definition 3 is more robust to the choice of spatial kernel functions but relies on a more sophisticated joint diagonalization algorithm.
The above two methods are formulated for the case of the SBSS model where the drift is assumed to be constant for all sample locations.Practically, when considering data with a present non-constant drift function, the performance of the SBSS methods above can be heavily downgraded when the population moments are replaced by their sample counterparts, especially when considering that E (x(s i )) is estimated by the sample mean for all i = 1, . . ., n.This effect might be reduced by relying on differences, which is usually done in geostatistics where the variogram is favored over the covariance matrix.We will adapt this idea for SBSS in the following.

Connection between SBSS and PCA
Generally, the PCA solution is computed by centering the data and projecting the centered data on the eigenvectors of the covariance matrix.If we write the eigen-decomposition of the covariance matrix of the observable random field x(s) as Cov(x(s)) = UDU ⊤ and recall that an SBSS unmixing matrix can be written as we can state the principal components, the whitened version of x(s) and the estimated latent components by respectively.With this, the connections between SBSS and PCA can be summarized: PCA is only based on the on-site covariance matrix whereas the orthogonal matrix Ũ is found based on spatial second order dependence (local covariance matrices are diagonalized) for SBSS.This means that PCA does not account for spatial dependence but only marginal dependence, and SBSS somewhat refines the PCA solution by additionally considering second order spatial dependence.Moreover, both PCA and SBSS find scores and loadings, thus the solutions of both methods can be interpreted in the same loadings and scores methodology.More details on the connection between SBSS and PCA are given by Nordhausen et al. (2015), and a general discussion on PCA and BSS is given by Nordhausen and Oja (2018a).

Local difference matrices for SBSS
Relying on differences is common practice in geostatistics where the variogram is the central quantity of structural analysis, much favored over the spatial covariance matrix, see textbooks such as the one by Chilés and Delfiner (2012).Differences are also beneficial for iid or time series data.In the latter case often differences are used to stabilize the mean, or remove seasonality effects.A popular model that is based on differences is the Autoregressive Integrated Moving Average (ARIMA) model, in which differences are modeled as an Autoregressive Moving Average (ARMA) model (Luceño and Peña 2008).For the iid case Nordhausen and Tyler (2015) emphasized that only scatter matrices which can be expressed in terms of differences possess the property that they are in every distributional case diagonal when the random vector is formed by statistically independent entries.The covariance matrix can be written in terms of differences.Furthermore, they showed theoretically that robust scatter matrices have this property as well when evaluated on differences, and practically that statistical methods such as independent component analysis (ICA) or observational regression greatly benefit from the use of differences.Therefore, we adapt local covariance matrices and introduce the following definition of local difference (LDiff) matrices, as firstly mentioned by Muehlmann et al. (2020).
Again, f is the spatial kernel function which determines the locality of the weighted average of differences, and the same rules apply as for LCov matrices.Note that when f is chosen to be the ring kernel, then the sample versions of LDiff matrices are proportional to the typical empirical variogram which is the core tool of structural analysis in geostatistics.Practically, for a given realization of x(s) on n sample locations, the key difference between LCov and LDiff matrices in this context is that LCov matrices rely on the sample average for the drift estimate, whereas LDiff matrices do not.Theoretically, when the random field has a nonconstant drift function m(s), then there occurs an additional additive term in LDiff matrices of the form m(s) − m(s ′ ), this additional term can be viewed as a bias term.This bias is expected to be small when the kernel function is designed in such a way that the locations s and s ′ are nearby and under the assumption that the drift appears locally constant and would for example change only smoothly.Therefore, we suggest to replace the LCov matrix by a LDiff matrix in Definition 2. It would also be possible to adapt Definition 3 in such a way.However, we decided against this adaptation as usually the kernel functions are chosen in such a way that their parameters are increasing.E.g.: ring kernels where the radii parameters are increasing for each kernel.For small radii this would still lead to a small bias terms in LDiff matrices as m(s) − m(s ′ ) can be assumed to be small as described before.In contrast, higher radii lead to further separated coordinate pairs which leads to higher bias terms, something we want to avoid with the former adaption.Therefore, we only consider the adaptation of Definition 2 as follows.
Definition 4. For a random field x(s) following the SBSS model in Definition 1 the unmixing matrix functional W simultaneously diagonalizes the covariance matrix and one local difference matrix for a given kernel function f such that Where D f is a diagonal matrix with increasingly ordered diagonal elements.
Again, W f can be found by solving the generalized eigenproblem or by the two step algorithm discussed above.So far we have outlined the contributions given by Muehlmann et al. (2020).
Note however, that in the above definition the whitening step still relies on the use of LCov 0 which again leads to the disadvantages discussed before.Therefore, in the following we present our main theoretical contribution, namely, ways on how to treat the whitening step for non-constant drifts.

Adaptation of the whitening step
To define a robust version of the whitening step in the presence of a drift we suggest to replace the LCov 0 matrix by some LDiff matrix.Replacing the matrices which are used in Steps 1. and 2. of the general outline on how to construct unmixing matrices (see the end of Section 2.1) is not a new idea in the BSS community.In the following we briefly discuss two examples: (i) Nordhausen, Oja, and Ollila (2008); Oja, Sirkiä, and Eriksson (2006) discuss ways of formulating robust versions of ICA methods.They find the unmixing matrix by simultaneous diagonalization of two scatter matrices S 1 and S 2 , where both not necessarily need to be the covariance matrix.S 1 and S 2 are chosen to be robust to outliers.(ii) In the discussion by Belouchrani and Cichocki (2000) and Georgiev and Cichocki (2001) they replaced the additive vector m in Equation ( 1) by a white noise process for the time series BSS case.In order to cancel the influence of such a white noise process on the whitening step they replaced the covariance by a weighted sum of several autocorrelation matrices.Georgiev and Cichocki (2001) denoted this adaptation as robust orthogonalization.
Replacing the matrix used in the whitening step for SBSS is possible.But to construct an unmixing matrix that follows the three-step principle as discussed at the end of Section 2.1, the SBSS model assumptions have to be adapted as follows.After the whitening step we only want to find an orthogonal matrix to recover the latent field.To achieve this it must be ensured that the matrix M, which is used for whitening, leads to an orthogonal matrix MA.In the classical SBSS M = LCov −1/2 0 (x(s)) and Assumption (SBSS 2) takes care that MA is indeed orthogonal.If we now replace Assumption (SBSS 2) of the SBSS model by assuming that LDiff f (z(s)) = I p , then MA is orthogonal when M = LDiff −1/2 f (x(s)).With this adaptation we achieved the goal of replacing the standard covariance matrix by a local difference matrix in the whitening step which is more robust in the presence of a drift.However, this advantage comes at the cost that the covariance matrix of the estimated latent field is not the identity matrix anymore.This procedure is formulated in the following definition.
Definition 5. Consider two spatial kernel functions f 1 and f 2 .For a random field x(s) following the SBSS model in Definition 1 where Assumption (SBSS 2) is replaced by: (SBSS 2*): LDiff f 1 (z(s)) = I p for all s ∈ S.
The unmixing matrix functional W simultaneously diagonalizes the corresponding two local difference matrices such that Where D f 1 f 2 is a diagonal matrix with increasingly ordered diagonal elements.
As shown by Cichocki and Amari (2002), Theorem 4.2, the mixing matrix can be found when LDiff f 1 (x(s)) is symmetric positive definite and LDiff f 2 (x(s)) is symmetric and the generalized eigenvalues (diagonal elements of LDiff f 2 (z(s))) are distinct.It is easy to see that LDiff matrices are symmetric and positive definite in the context of Definition 1 when considering ball, ring and Gaussian kernel functions with strictly positive parameters.Again as in Definition 2 the unmixing matrix can be alternatively found in a two step fashion where the whitening is now carried out by x wt * (s) = LDiff −1/2 f 1 (x(s))(x(s) − m).Note that for this adapted whitening procedure LCov 0 (x wt * (s)) is not necessarily diagonal but LCov 0 (Wx(s)) is diagonal but not equal to I p .Therefore, in this adaptation the convenience of fixing the scale of z(s) is given up for the sake of practical more robust whitening with respect to a present drift.In practical considerations Wx(s) might be standardized by scaling each component to unit variance if the missing identity scale of z(s) leads to problems.
The choice of the kernel functions f 1 and f 2 need to be specified by the user.Ring, Ball, Gaussian or a mixture of these kernels are possible.The choice of the parameters for these two functions are important.There is always a tradeoff between the introduced bias of a possible non-constant drift which favors smaller parameters and the need of enough sample location pairs in the estimation of the two LDiff f 1 and LDiff f 2 matrices which favors larger parameters.We suggest to try out different values and use parameters that lead to a stable solution when parameters are changed on a small scale.
We also want to emphasize that the whitening step in Definition 2 and 3 can be adapted in a similar way by replacing LCov 0 by LCov f for some spatial kernel function f , but in contrast to the above outline LCov f is not necessarily positive definite which might be overcome by the approach described by Belouchrani and Cichocki (2000).
If the data show a non-constant drift m, then the former methods are able produce an unmixing matrix that is robust to this non-constant drift.Still, it is a problem to fully recover the latent field z(s) as the unknown drift function needs to be subtracted.In the following we briefly discuss ways of dealing with this in different possible downstream analysis tasks.

Comments on the drift
For data that are generated by the model of Definition 1, the latent random field can be recovered by z(s) = A(x(s) − m) up to sign, order (and also scale when considering the context of Definition 5).But often one finds in practical considerations that the data at hand shows a non-constant drift which violates the SBSS model assumption of a constant mean.In such a situation we suggest to use the methods of Definition 4 or 5 for the reasons outlined above, compute y(s) = W(x(s)) and handle the present drift in one of the following forms based on the aim of further analysis of the data.All the following relies on the fact that y(s) consists of uncorrelated entries, which is the great advantage of the SBSS framework.
If one is interested in predicting x(s) at some unobserved location s * , each entry of y(s) can be treated individually.Universal Kriging would be the natural choice as it is designed for the presence of a drift, but also any other prediction tool might be used, see textbooks such as the one by Chilés and Delfiner (2012).After predicting each entry of y(s) individually, the vector of predictions ŷ(s * ) can be formed and transformed by W −1 ŷ(s * ) to obtain a predictions of the original field x(s * ).This discards the use of multivariate prediction tools in favor of p univariate ones, which reduces the complexity of modeling significantly.This procedure was already investigated and described in detail for the constant drift case by Muehlmann et al. (2021b).
If the aim of further analysis is to recover the latent field z(s) without any drift, then for each entry of y(s) the still present non-constant drift needs to be estimated and subtracted.This can be done by any univariate method that is designed to estimate a non-constant drift of a univariate random field.The choice of the method and the possible implications of such a choice cannot be discussed in the present context and need to be handled by the user.We only want to emphasize that if a drift for each entry of y(s) is estimated then a vector of this predicted drift m(s) can be formed and subtracted from y(s), which results in a practical estimation of the latent random field without drift.Additionally, the predicted drift can be back-transformed by W −1 m(s) to obtain an estimation of the original drift of x(s).As before the great advantage is that this procedure replaces the use of one multivariate model by p univariate ones.

SBSS for intrinsic stationary random fields
In the following we discuss an adaptation of the SBSS model in Definition 1 for which the procedure of Definition 5 is the natural choice.We adapt the SBSS model (Definition 1) by assuming that z(s) is formed by uncorrelated intrinsic stationary random fields.More formally this yields to replace assumptions (SBSS 2) and (SBSS 3) in the following way.
(SBSS 2 * ): Cov(z(s)) = E z(s)z(s) ⊤ = D(s) for all s ∈ S where D(s) is a diagonal matrix with strictly positive diagonal entries, and for all s, s ′ ∈ S with s ̸ = s ′ , where K is diagonal holding two times the variogram for each entry of z(s) on its diagonal.
For this model the estimation of LCov f (x(s)) can be highly corrupted by the fact that the spatial covariance function for intrinsic stationary random fields is a function of the sample locations themselves and not of the distance between them.In contrast, the estimation of LDiff matrices is not corrupted as the variograms are still only dependent on the distance between sample locations.Therefore, the only choice for estimating the unmixing matrix which is consistent with the adapted SBSS model is the one from Definition 5, something we will also confirm by the simulations in the following section.

Simulations
In this part we validate the performance of our above introduced estimators by carrying out experiments on synthetic datasets.We use the statistical software R 3.6.1 (R Core Team 2021) with the help of the packages JADE (Miettinen, Nordhausen, and Taskinen 2017), SpatialBSS (Muehlmann, Nordhausen, and Virta 2021a) and RandomFields (Schlather, Malinowski, Menck, Oesting, and Strokorb 2015).
In the following simulations we focus on two-dimensional domains (d = 2) as this is usually the dimension of the coordinate set in spatial data analysis.The domains are of the form [0, l]×[0, l] with l ∈ {10, 20, 30, 40, 50, 60} denoted as l×l where the sample locations are either following a uniform or a skewed design.The distributions for the entries of sample locations s = (s 1 , s 2 ) ⊤ are s 1 ∼ U (0, 1) and s 2 ∼ U (0, 1) for the uniform setting and s 1 ∼ β(2, 4) and s 2 ∼ U (0, 1) for the skewed setting, where U and β denote the uniform and beta distributions, respectively.A set of sample locations for a given domain is formed by sampling l 2 iid samples of the former distributions which are then multiplied by l.For a given random field we estimate the unmixing matrix by the following five BSS methods.
The first two estimators are SBSS with LCov matrices from Definition 2 using a ring kernel with the parameter r = 1 (LCov Ball), and Definition 3 using three ring kernels with parameters (r in , r out ) ∈ {(0, 1), (1, 2), (2, 3)} (LCov Ring).Furthermore, we use two SBSS estimators with LDiff matrices, namely the one from Definition 4 using a ring kernel with the parameter r = 1 (LDiff Ball), and the one from Definition 5 where f 1 is the ring kernel function with (r in , r out ) = (0, 1) and f 2 is also a ring kernel function with (r in , r out ) = (1, 2) (wLDiff Ring).Lastly, we use the well known forth order blind identification (FOBI) algorithm (Cardoso 1989) which is an ICA method designed for iid data and therefore does not utilize spatial dependence information at all.
We evaluate the quality of the unmixing matrix estimation by computing the minimum distance index (MDI) (Ilmonen, Nordhausen, Oja, and Ollila 2010;Lietzen, Virta, Nordhausen, and Ilmonen 2020) which is defined by Here, Ŵ is the unmixing matrix estimate, and M is the set of all matrices of the form PDS, where P is a permutation matrix, D is a diagonal matrix with positive diagonal entries and S is a sign change matrix.As discussed above, ŴA should equal I p up to scale, sign and order which are exactly the indeterminacies captured by the set M. Loosely, the MDI measures the deviation of ŴA from I p in respect of the indeterminacies, where it takes values between 0, indicating a perfect estimate, and 1, indicating a poor estimate.Note that the MDI only depends on ŴA which is always equal to the unmixing matrix evaluated for the case of A = I p for affine equivariant BSS methods, for details see for example the outline by Bachoc et al. (2020).Hence, the results are equal independently of the specific form of A, which favors A = I p as a convenient choice for simulations.In the following we present results for two different random field models.

Weakly stationary latent field with external drift
In this simulation, we consider the random field model of Equation ( 1) where we chose p = 3 and A = I 3 .The latent random field is formed by three centered Gaussian random fields, where the second order dependence is determined by the well-known second order stationary Mátern covariance function (Guttorp and Gneiting 2006;Chilés and Delfiner 2012) defined by Here, σ 2 , ν and ϕ are the variance, shape and range parameters respectively.K ν denotes the modified Bessel function of second kind.The parameters are chosen to be (σ 2 , ν, ϕ) ∈ {(1.0, 0.5, 1.0), (1.0, 0.9, 1.7), (1.0, 1.3, 2.2)} which are illustrated in Figure 2.For the 3-dimensional drift function m(s) = (m 1 (s), m 2 (s), m 3 (s)) ⊤ we consider the following four settings.
Drift 1: For this case we choose m(s) = 0 for all sample locations.Therefore, this model reduces to the constant drift case which was examined in detail by Nordhausen et al. (2015) and Bachoc et al. (2020).
Drift 3: Here, we consider a linear drift in the first direction of the sample locations.For i = 1, 2, 3, m i (s) equals c i s 1 /s * 1 where the constants are (c 1 , c 2 , c 3 ) = (0.7, 1, 1.2).s 1 is the first entry of the sample locations s and s * 1 is the maximum value of all first entries for the given set of sample locations.Therefore, the trend ranges between (0, c i ] for all different domain sizes. Drift 4: In the same fashion as in Drift 2 we sample three artificial sample locations inside the domain at hand.These artificial locations define three cluster centers, a point belongs to the cluster where the Euclidean distance to the cluster center is minimal.For each cluster j = 1, 2, 3 the corresponding drift m j i is a sample from the uniform distribution U (0, 3).Consequently, the drift for this model lies in the interval [0, 3].Overall, this setting is constituted by an observable random field x(s) that is weakly stationary in each cluster of sample locations, which often denoted as a block-stationary model.
One example for the different drift settings described above is depicted in Figure 3.Note that the on-sight variance for each entry of z(s) equals one, moreover, the latent field is Gaussian distributed.Therefore, over 99% of the values of z(s) are in the interval [−3, 3] which highlights the impact of the ranges of the different drift settings.
The average MDI values based on 2000 repetitions for the uniform coordinate pattern are presented in Figure 4. Overall, for all considered drift settings the SBSS methods that are based on local difference matrices show much better performance.This is especially surprising for Drift 1 as it follows the original SBSS model.The adapted whitening step seems to be of minor influence with Drift 4 being an exception, in this setting the performance of LDiff Ball seems to stay constant even if the sample size increases.Similarly, the performance difference between SBSS that simultaneously or jointly diagonalize LCov matrices is not as significant.Figure 5 depicts the average MDI for the skewed sample location pattern.The qualitative meaning is very similar to the uniform setting with two minor exceptions.Firstly, the overall MDI is slightly higher for all settings and estimators, this might be explained by the fact that sample locations are only dense on the left part of the domain.Therefore, the effective sample size decreases when local covariance or local difference matrices are estimated, as certain sample locations do not have neighbors that are captured by the spatial kernel functions.Secondly, SBSS methods that are based on LDiff matrices do not suffer such a significant reduction of performance as the ones using LCov matrices.

Intrinsic stationary latent field with constant drift
In this simulation we again consider the SBSS model of Equation ( 1) where p = 3, A = I 3 and m = 0.In contrast to Definition 1 the latent random field is now formed by uncorrelated intrinsic stationary random fields, which is an example for the outline in Section 3.3.Specifically, we chose the second order dependence of the entries of z(s) to follow the fractional Brownian field covariance function (Dobrić and Ojeda 2006) defined by In this equation, H ∈ (0, 1] is denoted as the Hurst parameter.The fractional Brownian field is a well-known intrinsic stationary but not weakly stationary random field model.We choose the Hurst parameters for the latent field to be H ∈ {0.3, 0.5, 0.8}, where Figure 6 depicts one example.
Figure 4: Average MDI based on 2000 iterations for the uniform sample location pattern where the observed random field is formed by a weakly stationary latent field and four different additive drift functions.
The average MDI values based on 2000 simulation iterations for the uniform and skewed sample location pattern are presented in Figure 7.The SBSS methods that only rely on the use of LCov matrices show a very poor performance.This is expected as the spatial covariance of z(s) depends on the actual sample location, which in turn leads to the fact that proper estimation of LCov matrices is impossible.In contrast, the SBSS methods relying on LDiff matrices still perform well, because, the variance of the difference processes of the random field at hand still depends only on the distance between sample locations.Clearly, the method with the adapted whitening step shows the overall best performance as also the whitening step is not corrupted by the intrinsic stationary nature of the observed random field x(s).

Data example
In the following we apply our proposed adaptation of SBSS on a dataset derived from the GEMAS project (Reimann, Birke, Demetriades, Filzmoser, and O'Connor 2014), which was also considered by Muehlmann et al. (2020) and Muehlmann, Bachoc, and Nordhausen (2022a).The aim of this analysis is to illustrate how the adapted version of SBSS can be applied on a real-world application, moreover, we show that the algorithms are able to find latent components with substantial spatial dependencies.The main research question which could be answered by SBSS and the introduced adaptations might be: Are there some important physical processes with substantial spatial dependence driving the structure of the data which can be interpreted in terms of loadings and scores?Which linear transformation results in spatially uncorrelated components?We try to answer these questions for a dataset which is freely available in the R package robCompositions (Templ, Hron, and Filzmoser 2011).The data are formed by concentration measurements in mg/kg of 18 elements (Al, Ba, Ca, Cr, Fe, K, Mg, Mn, Na, Nb, P, Si, Sr, Ti, V, Y, Zn, Zr) at 2107 samples of agricultural soils in Europe, with locations given in latitude and longitude coordinates.
As it is common practice in regional geochemistry we respect the relative information of the data by using principles of compositional data analysis (Aitchison 2003).Specifically, we carry out a three step analysis.Firstly, we transform the dataset into centered log-ratio (clr) coordinates.Clr coordinates are often used for interpreting the results as they capture the dominance of the considered variable over the geometric mean of all the others.But clr coordinates have the disadvantage that each observation adds up to zero, which in turn results in the fact that for example the covariance matrix is not invertible, something which is needed in BSS.Therefore, we transform the data into isometric log-ratio (ilr) coordinates by using pivot coordinates, which is only an orthogonal transformation of the clr data, this orthogonal matrix is usually denoted as contrast matrix (Filzmoser, Hron, and Templ (2018)).
Interpretations of the results in terms of the ilr coordinates can be considered more difficult as the additional orthogonal transformation needs to be considered.Therefore, we prefer interpretations in terms of clr coordinates.As ilr coordinates are indeed of full rank, all SBSS methods can be carried out in that space and the so-called combined loadings matrix consists of the matrix product of the contrast matrix and the estimated unmixing matrix.Interpretations of the results are carried out with the combined loadings matrix in terms of the clr coordinates.This procedure is detailed by Nordhausen et al. (2015), where the original SBSS is applied in the context of regional chemistry.The ilr transformation reduces the dimension of the data from p = 18 to p = 17.
It was already shown by Muehlmann et al. (2020) that the norms of the difference between the global average and the measurements on-site are not constant throughout the spatial domain, which suggests to use LDiff matrices in favor of LCov matrices.In contrast to to the outline by Muehlmann et al. (2020) where the method of Definition 4 is utilized, we analyze the data with the adapted whitening step seen in Definition 5. Specifically, we chose f 1 and f 2 to be ring kernel functions with the parameters (0 • , 2 • ) and ( 2• , 4 • ) respectively.We choose these values based on the practical guidelines given in Section 7.
Figure 8 presents the first two entries of the estimated latent field and Table 1 shows the corresponding first two rows of the combined loadings matrix.The first entry is mainly driven by Aluminum and Silicium based on the high absolute values of the combined loadings.Therefore, in the glacial sediments of northern central Europe this indicates that Silicium is a dominant element, whereas Aluminum is more dominant in the Southern areas, which is confirmed by the original clr maps.For the second component it is interesting to see that Unif Skew 1 0 x 1 0 2 0 x 2 0 3 0 x 3 0 4 0 x 4 0 5 0 x 5 0 6 0 x 6 0 1 0 x 1 0 2 0 x 2 0 3 0 x 3 0 4 0 x 4 0 5 0 x 5 0 6 0 x 6 0 1 0 x 1 0 2 0 x 2 0 3 0 x 3 0 4 0 x 4 0 5 0 x 5 0 6 0 x 6 0 1 0 x 1 0 2 0 x 2 0 3 0 x 3 0 4 0 x 4 0 5 0 x 5 0 6 0 x 6 0 This dominance is also visible in southern Italy, in the Baltic countries and in parts of the Ukraine.In contrast, dominance of Aluminum-Vanadium is observed in soils along the Alps, in parts of the Balkan Peninsula and the Carpathian Mountains, but also in the western part of the UK and in Norway.A thorough interpretation of the results might be carried out by domains experts.

Results
The main theoretical result is provided by the method formulated in Definition 5.This novel method is based on the original SBSS framework (see Section 2) but the whitening step is carried out with respect to a local difference matrix.This adaption ensures that a drift, which is assumed to be locally constant, has minor impact on the quality of the source separation  compared to the original SBSS methods or the adaption provided by Muehlmann et al. (2020).Furthermore, this adaptation leads to a method that can be used on intrinsic stationary random fields, a more broader class of process compared to the second-order stationary ones which are needed for SBSS.These two statements are investigated in a thorough simulation study.In the first simulations, random fields where contaminated with several additional nonconstant mean functions.As shown in Figure 4 and Figure 5 the method seen in Definition 5 indeed outperforms all contender SBSS methods.A similar outcome is seen in the results for the second simulations in Figure 7 where the latent random field is intrinsic-stationary.We also showed how our newly introduced method can be applied on a geochemical data example, Figure 8 depicts the first two found latent components.

Discussion
In this paper we recalled SBSS, which is an unsupervised statistical tool that finds uncorrelated latent fields given a multivariate observed random field.This methodology offers one way to deal with multivariate spatial data in such a way that univariate methods can be used, a very appealing approach as multivariate spatial data is generally demanding to model.For the original SBSS method this useful feature was already investigated for interpolation tasks by Muehlmann et al. (2021b).However, the original SBSS methods rely on the strong assumption that the drift is constant for the whole considered spatial domain, which is a very restrictive assumption.We introduced a new scatter matrix which measures second order spatial dependence based on differences, and argued that replacing local covariance matrices with local difference matrices yields advantages when a drift is present in the data.We also adapted the whitening step of the SBSS methods by such a replacement.This leads to the main theoretical contribution, namely the methods seen in Definition 5. Specifically, this method uses simultaneous diagonalization of two local difference matrices.This completely avoids the estimation of the mean and leads to a higher quality source separation when the drift can be assumed to be locally constant.Still, the vague assumption of a local constant mean needs to be formalized in a mathematical way.The challenge in this task arises by the fact that the locality is determined by the parameters of the used kernel functions.E.g.: when using two ring kernels with higher radii, the drift needs to be constant along larger distance compared to two ring kernels with lower radii parameters.This interdependence between drift and kernel parameters leads to the challenge in mathematically defining local constant, a task for future research.However, the simulations showed that the novel methods introduced by Definition 5 lead to a higher quality estimation for a broad selection of different drift functions.
The original motivation for the adapted whitening step by Belouchrani and Cichocki (2000) is in the context of BSS for time series data with additive white noise.Such a model can also be considered for the spatial case, in which the drift m would be replaced by a centered p-variate white noise process n(s) with Cov(n(s)) = M in Equation ( 1).This might be viewed as an external nugget effect.In such a case considering f 0 , ball or Gaussian spatial kernel functions LCov f (x(s)) equals ALCov f (z(s))A ⊤ + M, but for a ring kernel function the additive term M would disappear.Therefore, the source separation would be influenced by the covariance matrix of the noise when using f 0 , ball or Gaussian spatial kernel functions.This is, as for the case with a drift, a problem in the whitening step.Thus, proper estimation of W could be carried out with an adapted whitening step were only ring kernel functions are used.The problem here would be that local covariance functions with ring kernels are not necessarily positive definite.Hence, the computation of the inverse square root of this quantity is not always possible.This issue can be solved by using a weighted average of many local covariance functions with different ring kernel parameters.When the weights are found by a sophisticated algorithm (Belouchrani and Cichocki 2000), then the resulting matrix is positive definite and whitening would be possible.Note that for this case LDiff matrices would be useless as they always carry on-sight covariance terms independent of the used spatial kernel function.This in turn leads to the fact that the covariance matrix of the noise would always be an additive term for local difference matrices.
In Section 5 we showed how the new method can be applied to a dataset where it is not clear a priori if a non-constant drift is present.The challenge for such a dataset is the choice of the two kernel functions and their parameters.Choosing useful parameters results in a trade-off between the following two effects.Larger radii parameters lead to a higher number of observations for the estimation of the two involved local difference matrices.This effect favors higher radii.In contrast, lower parameters are better suited to a drift function that changes more prominently in space.In total, there is trade-off between sample size for estimation and how the drift changes over space.A practical guideline on how to choose good parameters can be given as follows.If the SBSS model holds, then a broad set of parameters leads to the true latent field (up to sign and order).Therefore, satisfying parameters can be found by a grid search over the parameter space.The parameters that lead to similar solutions are the ones which can be kept.The parameters that lead to unstable solutions are highly likely to be parameters that violate the identifiability conditions and should be discarded.

Conclusions
Based on the theoretical and experimental results of the present paper the following conclusions can be made: • If a drift is present in the data, the quality of source separation will be higher when using the novel estimator seen in Definition 5 as confirmed by the simulation study.
• The radii parameter selection for the used kernel functions highly influences the found latent components.In practical considerations, suitable parameters can be determined by characteristic scales of spatial variation provided by domain experts, or by the strategy outlined in the former section.
• The new introduced method is also capable of recovering latent fields that are not second order stationary but intrinsic stationary.As the class of intrinsic stationary processes is broader than the class of second order stationary processes, the new method can be viewed to be more general than the classical SBSS methods.
Figure 1 depicts one 20 × 20 example for the uniform pattern on the left panel and one 40 × 40 skewed pattern example on the right panel.The considered random field models which are simulated on the sample location patterns are discussed in the subsequent chapters.

Figure 1 :
Figure 1: Example sample locations for the uniform pattern for a domain of size 20 × 20 (left) and for the skew pattern for a domain of size 40 × 40 (right).The circles of radii 1, 2 and 3 represent the kernel function choices for SBSS.

Figure 3 :
Figure 3: Drifts for one simulation replicate of drift Models 2-4 on a domain of size 20 × 20.

Figure 5 :
Figure 5: Average MDI based on 2000 iterations for the skewed sample location pattern where the observed random field is formed by a weakly stationary latent field and four different additive drift functions.

Figure 6 :
Figure 6: One example for the latent random field which follows the fractional Brownian field model on a 20 × 20 domain with uniform sample location pattern.The Hurst parameters equal 0.3, 0.5 and 0.8 for z 1 , z 2 and z 3 respectively.

Figure 8 :
Figure 8: First (left) and second (right) entry of the estimated latent random field with the method of Definition 5. Where, f 1 and f 2 are ring kernel functions with parameters (0 • , 2 • ) and (2 • , 4 • ).Map tiles by Stamen Design, under CC BY 3.0.Data by OpenStreetMap, under ODbL.

Table 1 :
Combined loadings for the first (z 1 ) and second (z 2 ) entry of the latent random field.Aluminum and Vanadium as well as Iron and Potassium show roughly the same absolute values with different signs.Accordingly, Iron and Potassium are dominating in the geochemical composition of the soils in southern and eastern Spain, where the soils are formed differently than in the remaining part of the country and in Portugal.