Spatial Filtering with EViews and MATLAB

This article summarizes the ideas behind a few programs we developed for spatial data analysis in EViews and MATLAB. They allow the user to check for spatial autocorrelation using Moran’s I and provide a spatial filtering procedure based on the Gi statistic by Getis and Ord (1992). We have also implemented graphical tools like Moran Scatterplots for the detection of outliers or local spatial clusters. Zusammenfassung: Dieser Artikel beschreibt einige Programme, die wir zur Analyse räumlicher Daten in EViews und MATLAB entwickelt habe. Sie erlauben dem Anwender eine Überprüfung auf räumliche Autokorrelation mittels Moran’s I und ermöglichen ein räumliches Filterungsverfahren basierend auf der Gi Statistik von Getis and Ord (1992). Außerdem haben wir grafische Werkzeuge (z.B. Morans Streudiagramme) zur Analyse auf Ausreisser und zum Auffinden lokaler räumlicher Zusammenschlüsse implementiert.


Introduction
In recent years spatial econometric methods gained in popularity.Nevertheless, the widely used econometric software package EViews1 does not contain functions for spatial data analysis.Therefore, we have developed a few scripts which can be started by a short entry in the command line.For users of the spatial econometrics toolbox by James P. LeSage2 the MATLAB3 version of the functions is an appropriate extension.
The programs calculate the global and local Moran's I statistic for spatial autocorrelation and its moments using the normal approximation and the more accurate saddlepoint approximation by Tiefelsdorf (2002).It is possible to check the data with graphical tools like Moran Scatterplots for outliers or local spatial clusters.Finally, we have implemented a filtering procedure based on the G i statistic, which is another measure for local spatial dependence.So the user has an uncomplicated possibility to handle the presence of spatial autocorrelation in the data.For further details about the programs and the theoretical background see Ferstl (2004).

Global Spatial Structures
The analysis of spatial data begins with a feasible measure to represent the inherent spatial dependence.The elements of a symmetric n × n spatial distance matrix D are defined by where (θ, ω) are the cartesian coordinates.In contrast to such a matrix, which measures dissimilarities, it is possible to use a spatial weight matrix to describe the similarities between the i-th and j-th spatial object.
There are several methods to transform a distance matrix to a global spatial weight matrix G.We implemented the widely used negative exponential function to model the distance decay function (see e.g.Badinger et al., 2004).Therefore, the geographical distances are transformed to spatial weights by

Coding Schemes for Spatial Weight Matrices
The programs provide three common coding schemes for standardization of the spatial weight matrix G. Tiefelsdorf (2000) defines the linkage degree of the i-th spatial object by d i = n j=1 g ij .Therefore, the overall connectivity in G is D ≡ n i=1 n j=1 g ij .The globally standardized C-coding scheme is defined as In the row-sum standardized W -coding scheme the spatial weight matrix is transformed by where d is a vector with the row-sums of G.In the variance stabilizing S-coding scheme Tiefelsdorf (2000) defines the vector q by As in the row-sum standardization the spatial weight matrix is transformed by S * ≡ [diag(q)] −1 G. Finally, it is scaled by Following the notation of Tiefelsdorf (2000), we use V as a placeholder for a spatial weight matrix which has been transformed by one of the three mentioned coding schemes.

Local Spatial Structures
The local spatial weight matrix for the i-th spatial object consists of a copy of the i-th row and column of the global spatial weight matrix and zeros elsewhere.This definition leads to the following star-shaped form for the standardized local spatial weight matrices The scaling coefficient s i depends on the applied coding scheme (see Tiefelsdorf, 2000, p. 35) , for the C-coding scheme, n/2d i , for the W-coding scheme, n 2 /2Qq i , for the S-coding scheme.The sum over all local spatial weight matrices equals the global spatial weight matrix, i.e.

Global Moran's I
Having the spatial structure summarized in a spatial weight matrix, it is possible to define a measure for the spatial autocorrelation.The most prominent one is a statistic developed by Moran (1948), which is called Moran's I in the literature.According to Tiefelsdorf (2002) it is defined as follows.
Let y be a system of n spatially distributed observations that are related to a set of k exogenous variables via a linear regression model y = Xβ + .The n × k design matrix includes the usual constant vector.The disturbances are distributed as ∼ N (0, σ 2 Ω).
The n × n matrix Ω reflects the covariance structure and σ 2 the variance of the disturbances.For independent disturbances the covariance structure is σ 2 I.The OLS regression residuals ˆ = M y are distributed as ˆ ∼ N (0, σ 2 M ΩM ) using the projection matrix M ≡ I − X(X X) −1 X .For an underlying autoregressive spatial process Ω ≡ (I − ρV ) −1 (I − ρV T ) −1 , where ρ is the spatial autocorrelation coefficient.Global Moran's I is defined as a scale invariant ratio of quadratic forms in the normally distributed vector of regression residuals ˆ The multiplication with V + V /2 in the numerator ensures symmetry of the standardized spatial weight matrix.

Local Moran's I
Frequently the spatial dependencies are not the same for the whole dataset.Therefore, it is advisable to calculate a local test statistic.Local Moran's I i is defined by using the local spatial weight matrix introduced in Section (2.3), i.e.
Due to the additivity property in (1) the global Moran's I can be expressed as sum over all local Moran's I i .

Moments of Moran's I
The moments of the global and local Moran's I statistic are calculated under the assumption of spatial independence.The moments of I can be expressed in terms of the resequenced spectrum of eigenvalues {λ 1 , . . .,

Distribution of Moran's I
The z-transformed observed Moran's I o is asymptotically normal distributed, i.e. 2000) was the first who published the exact reference distribution of Moran's I. Its evaluation is complicated and computationally expensive.He proposed a saddlepoint approximation which is fast and accurate.Our software implementation yields the same results as the SPSS macro on the authors homepage4 .For the mathematical background of the saddlepoint approximation see e.g.Tiefelsdorf (2002) or Ferstl (2004).

R. Ferstl 21
4 Moran Scatterplots Anselin (1996) describes the Moran scatterplot as a tool for exploratory spatial data analysis (ESDA).Using the definition of global Moran's I in (2), it is possible to interpret I as the coefficient in a regression of the spatially lagged variable (V + V )ˆ /2 on ˆ .In other words, the slope of the regression line equals global Moran's I.
Moran scatterplots can be used to find local patterns of spatial association.They also provide an easy way to detect outliers.Additionally, our programs plot a smoothed curve through the datapoints (instead of the regression line), and line diagrams of the studentized residuals and the Cook's distances.It is also possible to generate a so called Moran scatterplot matrix, which helps to visualize spatial interactions between two or more variables (see Anselin et al., 2002) 5 Spatial filtering After having successfully detected spatial autocorrelation in the data, the question is how to handle it.The simplest alternative is to spatially filter the data, and estimate the model with the usual OLS procedure.

The G i Statistic
A spatial filter is based on a local statistic of spatial dependence.Contrary to the version developed by Getis and Ord (1992), we use a non-binary weight matrix like in Section (2.1).This extension was introduced by Getis and Ord (1995) and was recently used in Badinger et al. (2004) The v ij terms are the elements of a standardized weight matrix V .The x j represent observations of a random variable X j .The expected value of the G i statistic is given by The distribution of the z-standardized statistic z(G i ) is like Moran's I asymptotically N (0, 1).The G * i statistic differs from the G i statistic by including the value of Austrian Journal of Statistics, Vol. 36 (2007), No. 1, 17-26

The Filtering Process
The expected value in (3) represents the realized value in the i-th region when no autocorrelation occurs.Dividing it by the G i statistic results in a ratio that represents the spatially uncorrelated part of the data, i.e. the filtered variable X * Purely the spatial effects are stored in a new variable L = X − X * .The goal is to minimize the remaining spatial autocorrelation in the filtered variable by varying the distance decay parameter δ.That leads to the optimization of the objective function 6 Programs Table 1 shows a list of available program files for EViews and MATLAB.Detailed descriptions of the syntax are included as comments.Demos of the functions can be found in the example files mentioned in Table 2.The following examples are taken from Ferstl (2004).The analyzed series represents the gross value added per capita in million ECU in 1975 based on price and exchange rate levels from 1990.The i = 1, . . ., 194 spatial objects are NUTS 2 regions of the EU-15 countries.Table 3 shows the result of a test for global spatial autocorrelation.To perform the calculations in EViews these commands must be entered in the command line.The analyzed variable shows a highly positive and significant spatial autocorrelation.Taking a look at a Moran scatterplot yields the same conclusion.Figure 1 also includes measures to identify influential observations.The following code constructs a Moran scatterplot in EViews.

Conclusions
In this paper we shortly demonstrate a collection of programs we have developed for spatial data analysis in Ferstl (2004).The idea was to offer users of the widespread econometric software package EViews simple functions to analyze data for spatial dependencies.We have implemented functions to calculate Moran's I and its central moments using the normal and the saddlepoint approximation.Influential observations and spatial clusters can be visualized by Moran scatterplots.One way to deal with the presence of spatial autocorrelation is to filter the data.For this purpose functions are available, too.
All programs are available on request by email from the authors.They have been tested under Windows 2000/XP in EViews 4/5 and MATLAB 6/7.

'
Open d a t a s e t : open D a t a \ eu .wf1 ' C a l c u l a t e g l o b a l and l o c a l Moran ' s I : r u n m o r a n s a d s e r 0 1 c o n s t u w g l j

'
Figure 1: Moran Scatterplot with statistics for outlier detection

Figure 2 :
Figure 2: Classified choroplethe map of the G * i statistics

Table 3 :
Global Moran's I with normal and saddlepoint approximation