Spatial Temporal Conditional Auto-regressive Model: a New Autoregressive Matrix

In the study of geographical patterns of disease, multivariate areal data models proposed so far in the literature (Ma and Carlin, 2007; Carlin and Banerjee, 2003; Knorr-Held and Best, 2001) have allowed to handle several features of a phenomenon at the same time. In this paper, we propose a new model for areal data, the Spatial Temporal Conditional Auto-Regressive (STCAR) model, that allows to handle the spatial dependence between sites as well as the temporal dependence among the realizations, in the presence of measurements recorded at each spatial location in a time interval. Inspired by the Generalized Multivariate Conditional Auto-Regressive (GMCAR) model published by Jin, Carlin, and Banerjee (2005), the STCAR model reduces the unknown parameters to the single parameter of spatial association estimated at every period considered. Unlike the Vector Auto-Regressive (VAR) model proposed by Sims (1980), in addition, its space-time autoregressive matrix takes into account the spatial localization of the realizations sampled. Moreover, we already know that the main areas of application of these models relate to disease mapping, disease clustering, ecological analysis (Lawson, Browne, and Vidal Rodeiro, 2003). In this work, however, the STCAR model is applied in business, exploiting the analogy between the danger of contracting a particular disease and the risk of falling into bankruptcy, in order to " reconstruct " the spatial temporal distribution of expected bankruptcies of small and medium enterprises of the province of Lecce (Italy). Wirtschaft angewandt und nutzt die Analogie zwischen der Gefahr an einer gewissen Krankheit zu erkranken und dem Risiko bankrott zu gehen, um die räumlich-temporale Verteilung der erwarteten Konkurse kleinerer und mit-tlerer Betriebe in der Provinz Lecce (Italy) zu " rekonstruieren " .


Spatial Temporal Conditional Auto-Regressive Model:
A New Autoregressive Matrix Leonardo Mariella and Marco Tarantino University of Salento, Lecce, Italy Abstract: In the study of geographical patterns of disease, multivariate areal data models proposed so far in the literature (Ma and Carlin, 2007;Carlin and Banerjee, 2003;Knorr-Held and Best, 2001) have allowed to handle several features of a phenomenon at the same time.In this paper, we propose a new model for areal data, the Spatial Temporal Conditional Auto-Regressive (STCAR) model, that allows to handle the spatial dependence between sites as well as the temporal dependence among the realizations, in the presence of measurements recorded at each spatial location in a time interval.Inspired by the Generalized Multivariate Conditional Auto-Regressive (GMCAR) model published by Jin, Carlin, and Banerjee (2005), the STCAR model reduces the unknown parameters to the single parameter of spatial association estimated at every period considered.Unlike the Vector Auto-Regressive (VAR) model proposed by Sims (1980), in addition, its space-time autoregressive matrix takes into account the spatial localization of the realizations sampled.Moreover, we already know that the main areas of application of these models relate to disease mapping, disease clustering, ecological analysis (Lawson, Browne, and Vidal Rodeiro, 2003).In this work, however, the STCAR model is applied in business, exploiting the analogy between the danger of contracting a particular disease and the risk of falling into bankruptcy, in order to "reconstruct" the spatial temporal distribution of expected bankruptcies of small and medium enterprises of the province of Lecce (Italy).
In this paper, we propose a new model of space-time, called Spatial Temporal Conditional Auto-Regressive (STCAR) model, which directly specifies the joint distribution of a sequence of Markov random fields (Cressie, 1993) via conditional and marginal distributions, using information derived from temporal evolution of the phenomenon.In particular, the STCAR model is constructed through a space-time autoregressive matrix so as to give a temporal coefficient in the same location sampled in different instants, a spatial coefficient in nearby locations sampled in the same instant, the product between a temporal coefficient and a spatial coefficient in nearby locations identified in different instants.Unlike the Generalized Multivariate Conditional Auto-Regressive (GMCAR) model proposed in Jin et al. (2005), the STCAR model is used, not to treat more features at the same instant, but the same feature recorded in a time interval.This peculiarity reduces the number of parameters of the GMCAR model to a single parameter of spatial association estimated in the respective time: this in turn leads to a significant reduction in the computational burden in hierarchical spatial random effect modeling.Moreover, our space-time autoregressive matrix differs from that of the Vector Auto-Regressive (VAR) model proposed in Sims (1980), since the coefficients of STCAR model also evaluate the close proximity of spatial locations.
From the practical point of view, CAR models are usually used in the fields of medicine and public health.In this document, however, the model is unusually proposed in business in order to deal with cases of bankruptcy of small and medium enterprises in the province of Lecce (Italy), in a time interval of four years.The commercial success of a small retail enterprise, in fact, may depend on both internal factors, such as the close link between goods and/or services offered and market trends, and external factors, such as the presence in the immediate vicinity of big malls which create areas off limits to small neighboring enterprises.Consequently, the variation of the risk of insolvency of an enterprise may be analyzed through a series of thematic maps, known in literature as disease maps (Lawson et al., 2003), in order to identify any areas with high rates of bankruptcy.The purpose is that to submit these areas to a more detailed examination to determine if the presence of a nearby shopping center represents, over time, the cause of a high concentration of bankruptcies.More specifically, to construct maps of risk in the present work, first, it was necessary to aggregate the cases of bankruptcy by the natural partition of Salento in 97 municipalities of the province of Lecce, for each year between 2000 and 2003.Then, the distribution of risk of bankruptcy of a local enterprise was obtained using the STCAR model as the distribution of random effects in the mean structure in a hierarchical model through a particular Markov Chain Monte Carlo (MCMC) method, known as the Gibbs sampler (Böhning andSarol, 2000a, 2000b).The parameters of spatial association were estimated using the Maximum Likelihood technique and counted by a formal iterative procedure, known as Newton-Raphson procedure (Ord, 1975), while those of temporal association were estimated using the Weighted Least Squares method.
From the computational point of view, the STCAR model has been implemented in a Bayesian framework via the WinBUGS software (Spiegelhalter, Thomas, Best, and Lunn, 2005) and, in particular, the GeoBUGS package (Thomas, Best, Lunn, Arnold, and Spiegelhalter, 2004), while the space-time parameters have been estimated respectively via the Scilab software (Antonelli and Chiaverini, 2009) and the R software (R Development Core Team, 2008).
The rest of this article is organized as follows.Section 2 is devoted to providing background information on CAR model.Section 3 describes the new STCAR model for the analysis of space-time data and Section 4 presents the estimation procedures for parameters of the model proposed.The STCAR model is then illustrated in Section 5 using data on bankruptcies in the province of Lecce from 2000 to 2003, through construction, implementation and validation of a Hierarchical Bayesian model.Finally, Section 6 presents the conclusions of this work.

Methodological Background: CAR Model
The Conditional Auto-Regressive (CAR) model, known in the literature as Auto-Normal model or Gauss-Markov model (Chellappa, 1985) is used, usually, to do "local investigations", i.e. it allows to analyze phenomena that occur in a geographical area immediately surrounding the site analyzed.More specifically, the CAR model is a model of continuous Markov random field characterized by a conditional probability density function and particularly suited to model spatial phenomena strongly tied to a specific local context (Besag, 1974;Cressie, 1993).Its utility is also largely attributed to the existence of a clear link between the conditional probability distributions and the joint probability distribution (Besag, 1974;Smith, 2001).
We consider a spatial domain S = {1, . . ., n} and the neighborhood N i of a site i, i ∈ S, i.e.
Assigned a random variable X i , i ∈ S, we define the corresponding random field X, i.e.
• the joint probability density function where µ is a n-dimensional vector, i.e.
and Σ D is a n × n diagonal matrix, i.e.
Thus, a Conditional Auto-Regressive model X can be expressed equivalently in terms of: • conditional probability density function (1), denoted as follows: • joint probability density function (2), denoted as follows: Necessary and sufficient condition so that ( 4) is a valid joint probability density function is that its covariance matrix is not only symmetric, but also positive definite.For this reason, we must define: • a symmetric weighted adjacency matrix W, i.e.
where ϕ(i, i * ) is a measure that quantifies the proximity between the sites i and i * ; • a diagonal matrix of normalization W D of the following type Then, assigned the matrix of interaction β as a normalized adjacency matrix, i.e.
and supposed the matrix Σ D corresponding to a constant diagonal matrix, normalized too as the previous one, i.e.
the model ( 4) can be expressed as follows The constant value σ 2 represents the overall variability, while the value of the parameter of spatial association ρ, suitably chosen (Carlin and Banerjee, 2003;Cressie, 1993;Sun, Tsutakawa, and Speckman, 1999), represents the overall effect of spatial dependence.
For the extension to space-time case, it will be useful to express the model (4) in matrix terms, i.e.
where is a vector, called vector of pseudo errors, defined as follows Note that although the components of the vector of pseudo errors are not independent, i.e.
E( ) = 0 , var( ) = Σ D B , Austrian Journal of Statistics, Vol. 39 (2010), No. 3, 223-244 the error in a given location is independent of the auto-regressive random variable in the nearby location, since the covariance between the vector of pseudo errors and the random field is equal to a diagonal matrix: Furthermore, the Normal distribution on X "induces" the distribution on , i.e.
or, in more detail,

A new Approach: STCAR Model
The temporal analysis of spatially referenced data led to formulate a new model, called Spatial Temporal Conditional Auto-Regressive (STCAR) model, in order to handle the time evolution of a simple Conditional Auto-Regressive (CAR) model.Several multivariate areal models have been proposed so far (Mardia, 1988;Kim, Sun, and Tsutakawa, 2001;Carlin and Banerjee, 2003).Most recently, Jin et al. (2005) introduced a new flexible class of Generalized Multivariate CAR (GMCAR) models for areal data and showed how it enriches the existing Multivariate CAR (MCAR) class (Gelfand and Vounatsou, 2003).Their method directly specifies the joint distribution for a multivariate Markov random field through the specification of simpler conditional and marginal models, in order to treat more features at the same instant.In particular, they consider modeling the death rates from lung ad esophagus cancers in the years from 1991 to 1998 in Minnesota counties, a setting in which association would be expected both within and across the areal units.
In this article, we use the same procedure in order to treat the same features recorded in a time interval.This involves the use of a space-time autoregressive matrix that allows to handle the spatial dependence between sites as well as the temporal dependence among the realizations.In particular, unlike the Vector Auto-Regressive (VAR) model proposed in Sims (1980) to capture the evolution and the interdependencies between multiple time series, this space-time autoregressive matrix also evaluate the close proximity of spatial locations.All the variables in a VAR model are treated symmetrically by including for each variable an equation explaining its evolution based on its own lags and the lags of all the other variables; in our STCAR model, however, this equation will be built bearing in mind also the neighborhood criterion applied to locations sampled.
Finally, we employ STCAR distributions as specifications for second-stage random effects in a Bayesian framework with an application in modelling bankruptcy of small and medium enterprises in the years from 2000 to 2003 in the province of Lecce (Italy).
To illustrate this approach, we begin with the case of areal data collected in three locations and in two periods.In particular, we hypothesize a spatial domain S = {1, 2, 3}, with N 1 = {2}, N 2 = {1, 3}, N 3 = {2} and a temporal domain T = {1, 2} and we consider the random fields Z 1 and Z 2 present, respectively, at periods 1 and 2, i.e.
For ease of exposition, we define the random fields Z 1 and Z 2 as the temporal sequence of two CAR models of type ( 6) with expected value zero, i.e.
In the presence of a vector (Z 1 , Z 2 ) of random fields which follows a multivariate normal distribution, i.e.
where Σ 21 = Σ 12 , the joint probability density function can be obtained as the product of the conditional probability density function and the marginal probability density function, i.e.
In other words, the evolution in the times from 1 to 2 of a phenomenon is obtained through the information available at time 1 and the information available at time 2, known what happened previously.Consequently, from the theory of the multivariate Normal distribution, we have that .
and, supposing that C 1 = Σ 21 Σ −1 11 , we can write the distribution (9) as follows Our model is constructed so that the vector Z 2 at time 2 depends linearly on itself, lagged by one time, Z 1 , i.e. where • C 1 is a (3 × 3)-matrix, called space-time autoregressive matrix, built from a time component r 1 and a space component B −1 2 B 1 , i.e.
• δ 2 is the vector of pseudo errors structured with a space component B −1 2 , i.e.
In other words, the model ( 11) can be expressed as or, in more details, So, to treat the same features recorded in a time interval, the space-time autoregressive matrix gives a temporal coefficient in the same location sampled in different instants, a spatial coefficient in nearby locations sampled in the same instant, the product of a temporal coefficient and a spatial coefficient in nearby locations identified in different instants.It follows that • excluding the temporal dependence, i.e. r 1 = 0, the model ( 12) is reduced to a CAR model, • excluding the spatial dependence, i.e. ρ 1 = 0 and ρ 2 = 0, the model ( 12) is reduced to a linear regression model, Note that, in the analysis of more features at the same instant conducted in Jin et al. (2005) and in Ma and Carlin (2007), the autoregressive matrix and the vector of pseudo errors differ from those proposed in the model (11).Indeed, the matrix C 1 of the GMCAR model is a matrix of the following type where η 0 and η 1 are the "bridging" parameters; in particular, η 0 associates pairs of areal random effects defined on the same area unit, while η 1 associates areal random effects among neighboring units.The vector δ 2 , however, is reduced to the vector of pseudo errors 2 of a CAR model.At this point, the generalization of the model ( 12) is straightforward.The model proposed and expressed in a completely original definition, allows to handle the spatial dependence between sites as well as the temporal dependence among the realizations, respectively through a CAR model and a linear regression model.Definition 3.1 (Spatial Temporal Conditional Auto-Regressive model).Suppose that S ⊆ R d , d ∈ N + and T ⊆ R are, respectively, a spatial domain and a temporal domain.Assigned a random variable Z it , i ∈ S, t ∈ T , we consider the temporal sequence of Conditional Auto-Regressive models with expected value zero, i.e.
A vector of random fields Z t , t ∈ T is called Spatial Temporal Conditional Auto-Regressive model of order p (STCAR(p)) if for every t, where t is the vectors of pseudo errors, i.e.
(Z 11 , Z 21 , . . ., Z n1 ) The joint probability density function of a STCAR model of order m can be obtained as follows: • the marginal distribution of Z 1 is the following type Note that, an element of absolute novelty with respect to the GMCAR model is the specification of its conditional expected value.In particular, since the model ( 13) is a space-time model, its expected value takes into account both the parameters of temporal association r 1 , r 2 , . . ., r p and the parameters of spatial association ρ t , ρ (t−1) , . . ., ρ (t−p) .Moreover, the last are the only spatial unknowns of the STCAR model, since we will show that the parameters of overall variability σ 2 t , σ 2 (t−1) , . . ., σ 2 (t−p) are linked to the parameters of spatial association.

Estimation of Space-Time Parameters
The STCAR model is constructed through a space-time autoregressive matrix so as to give a temporal coefficient in the same location sampled in different instants, a spatial coefficient in nearby locations sampled in the same instant, the product between a temporal coefficient and a spatial coefficient in nearby locations identified in different instants.Therefore, to estimate these unknown parameters, first we exclude the time dependence and we estimate the space parameters through the Maximum Likelihood technique; then we exclude the space dependence and we estimate the time parameters through the Weighted Least Squares method.
Assuming temporal independence, i.e. r 1 = r 2 = • • • = r p = 0, the model ( 13) is reduced to a CAR model: In this case, the log-likelihood function for ρ t and σ 2 t , supposed that Z t = z t , is thus the following Consequently, we obtain the maximum likelihood estimators: • the estimator σ2 t of the variance, i.e.
• the estimator ρt of the parameter of spatial association, i.e. that value which maximizes the following function It seems evident that the estimator σ2 t depends on the estimator ρt ; moreover, the calculation of this last focuses on assessment of the determinant of B t , i.e.

det(B
In particular, if β has eigenvalues λ 1 , λ 2 , . . ., λ n , it is well known that The eigenvalues λ 1 , λ 2 , . . ., λ n can be determined once and for all, so that ρt is the value of ρ t that minimizes the following function: supposing that l(z t ; ρ t , σ2 t ) = log(σ 2 t det(B t ) −2/n ).For the determination of ρt , we use a formal iterative procedure as Newton Raphson procedure (Ord, 1975).
Assuming spatial independence, i.e. 13) is reduced to a linear regression model: Note that, the n-dimensional vectors t are spatially homoscedastic, but temporally heteroscedastic, since the variance matrix depends on time t.To solve the problem of heteroscedasticity, we consider the relation ( 18) and assume that the variance matrix is a function of the regressor z t , i.e.
Therefore, it is easy to see that: The model thus transformed satisfies all the assumptions of a classical linear regression model and Least Squares Estimators are both correct and efficient.The estimate is an estimate of the Weighted Least Squares, since every t-th element is weighed with the factor |z t | −1 .

Case Study
The unit of survey adopted by the Chamber of Commerce, Industry, Agriculture and Handicraft (or C.C.I.A.A.) of Lecce corresponds to a bankrupt local enterprise, i.e. a insolvent legal and economic unit, or part of it, located in one of the 97 municipalities of Salento, in a span between the years 2000 and 2003.Thus, the population is made up of local units which have exercised one or more economic activities and were significantly related to their territory.To identify these activities, we referred to a list of 1991, known as the classification of Economic Activities (or, more simply, the ATECO classification), produced by the National Institute of Statistics (ISTAT) and structured according to a number of levels (Vicari, Ferillo, and Valeri, 2009).In particular, in order to satisfy the condition of strong territorial connotation, we examined the bankruptcies of enterprises operating in 11 business sectors that depend on the resident population (Table 1).Spatial analysis of these data was handled through Hierarchical Bayesian modeling (Besag, York, and Molliè, 1991), where there are a data distribution (or likelihood) at the first level and a explanatory distribution (or prior distribution), i.e. the STCAR model, at the second and last level.After estimating unknown parameters of our model, the Gibbs sampler generated the probability distribution desired (or posterior distribution) and the resulting maps of risk.
Note finally that, for a correct interpretation of data collected and results obtained in a multi-span, we created thematic maps that depict the percentiles of the distributions considered, so as to make them immediately comparable.

Construction of the Model
The closing of an enterprise that pursued a business closely linked to the territory represents a "rare event" since, in a four-year time frame, a residential complex does not vary significantly.In each municipality, in addition, bankruptcies are not apparently influenced by those of other municipalities in the province (Figure 1).For this reason, the cases of bankruptcy y it , reported in any single municipality i in the province of Lecce and for each year t considered can be interpreted as realizations of (97 • 4) independent Generalized Poisson likelihoods (Consul, 1989;Consul and Famoye, 1992) with two parameters λ it and θ t denoted by GP (λ it , θ t ), i.e.
In particular, a random variable Y it is said to have a Generalized Poisson distribution if its probability distribution is given by ) and its real-valued parameters λ it and θ t satisfy the following constraints: for which (λ it + m it θ t ) > 0 when θ t < 0. The expected value and variance of Y it are finite when θ t < 1 and are given by At this point, we consider the cases of bankruptcy Y t = (Y 1t , Y 2t , . . ., Y 97t ) , for every year t, as the response random variables corresponding to the given vector set Z t = (Z 1t , Z 2t , . . ., Z 97t ) .Like the Poisson regression model, according to the usual log-linear specification, we stipulate that the distribution of Y t , for any given Z t , is that of Generalized Poisson given by ( 22) with mean where φ t , t = 1, . . ., 4, is the vector of target parameters from the point of view of mapping and statistical inference.
Austrian Journal of Statistics, Vol. 39 (2010), No. 3, 223-244 Let Z = (Z 1 , Z 2 , Z 3 , Z 4 ) be a n-dimensional vector of explanatory random fields and Y = (Y 1 , Y 2 , Y 3 , Y 4 ) be the vector of response random fields.It is clear that the product of the (97 • 4) distributions of Y it given Z it will get the expression of the Generalized Poisson likelihood analyzed, i.e.The purpose of this analysis is to obtain the spatial-temporal evolution of the distribution of the expected value φ t of bankruptcies on Salento, through the information contained in the observed cases and the distributional assumptions.Often, the variability of expected bankruptcies is affected by a possible spatial interaction between adjacent areas in the territory considered.In particular, the risk of bankruptcy in a given municipality of the province of Lecce depends on the risk found in municipalities that are geographically close.In this case, the structured spatial heterogeneity among the observed cases can be modeled through a series of random variables Z it , i = 1, . . ., 97, t = 1, . . ., 4, which are present in each of the 97 municipalities of Salento and each of the 4 years hypothesized.According to Definition 3.1, the vector Z, or the succession of 4 random fields Z 1 , Z 2 , Z 3 and Z 4 , is a STCAR model, since its joint probability density function can be obtained as follows: where the conditional distributions are In order to assess the degree of proximity between the municipalities of the province of Lecce (Figure 2), we have chosen the measure ϕ(i, i * ) of the matrix (5) equal to the border l (ii * ) shared by the municipalities i and i * , i.e.Therefore, the final stage of the hierarchical model is to estimate the space-time parameters of the stochastic component.More precisely (Consul, 1989), the log-likelihood function of the Generalized Poisson model can be written as

ϕ(i, i
The maximum likelihood equations for the estimation of the parameters φ t and ϕ t are given by equating to zero the first partial derivatives of l(y t ; φ t , ϕ t ), i.e.
These equations are clearly non-linear in the parameters.Consequently, the maximum likelihood estimates of φ t and ϕ t were obtained through the algorithm Scilab's fsolve (Antonelli and Chiaverini, 2009).These approximations allowed to estimate the unknown parameters of our STCAR model (Table 2).
The parameters of spatial association ρ t , t = 1, . . ., 4, were obtained as minimum points of the function ( 19), by the known Newton Raphson procedure (Figure 3).Consequently, given the values of ρ t , it was easy to calculate the parameters of overall variability σ2 t , t = 1, . . ., 4, via the estimator (18).The parameters of temporal association r t , t = 1, . . ., 3, were obtained as Weighted Least Squares Estimates, based on information provided by the relation (20).

Implementation of the Model
To construct our maps of risk, it was necessary to estimate the probability distribution of the STCAR model, given the cases of bankruptcy on the province of Lecce.Through the After reaching convergence, we needed to run the simulation for a further number of iterations to obtain samples that can be used for posterior inference.One way to assess the accuracy of the posterior estimates was by calculating the Monte Carlo Error (MCE) for each random field of interest.As a rule of thumb, the simulation should be run until the MCE is less than about 5% of the sample standard deviation (Table 3).
Moreover, the choice of using a thinning of the parameter samples by every 10th iteration made possible a good mixing of the Markov chain.Therefore, the 15000 realizations have allowed to obtain the probability density function of random field on Salento after the 4th year of the survey.

Conclusion
This article has focused on the formulation of new Conditional Auto-Regressive (CAR) model, the Spatial Temporal Conditional Auto-Regressive (STCAR) model, capable of treating measurements recorded at each spatial location and at different periods.Main feature of the model proposed is the presence of a space-time autoregressive matrix that allows to handle the spatial dependence between sites as well as the temporal dependence among the realizations.
In particular, using the STCAR model as the distribution of the random effects in the mean structure in a Hierarchical Bayesian model, it was possible to obtain, through a particular Markov Chain Monte Carlo (MCMC) method, known as Gibbs sampler, a series of maps of risk of bankruptcy of small and medium enterprises located in 97 municipalities of the province of Lecce, in a span between the years 2000 and 2003.The trend illustrated by these maps showed that, over time, the municipalities affected by the mall and, for that reason, classified at high risk of sudden bankruptcy return to be attractive to small and medium local enterprises which integrate with the mall and take advantage of the synergy created by this last.
Finally, the validation of the hierarchical model hypothesized has allowed to establish the full validity of the results achieved by the STCAR model proposed.
it |z it ).

Figure 2 :
Figure 2: Neighborhood for municipalities of the province of Lecce.

Figure 4 :
Figure 4: Expected bankruptcies in the province of Lecce (C.C.I.A.A. data processing).

Figure 5 :
Figure 5: Probability of risk of bankruptcy in the province of Lecce (C.C.I.A.A. data processing).