Performance Evaluations of Gaussian Spatial Data Classifiers Based on Hybrid Actual Error Rate Estimators

Discrimination and classification of spatial data has been widely mentioned in the scientific literature, but lacks full mathematical treatment and easily available algorithms and software. This paper fills this gap by introducing the method of statistical classification based on Bayes discriminant function (BDF) and by providing original approach for the classifier performance evaluation. Supervised classification of spatial data with response variable modelled by Gaussian random fields (GRF) with continuous or discrete spatial index is studied. Populations are assumed to be with different regression parameters vectors. Classification rule based on BDF with inserted ML estimators of regression and scale parameters is studied. We focus on the derived actual error rate (AER) and the approximation of the expected error rate (AEER) for both types of models. These are used in the construction of hybrid actual error rate estimators that are spatial modifications of widely applicable D and O estimators applied in cases of independent observations. Simulation experiments are used for comparison of proposed AER estimators by the minimum of unconditional mean squared error criterion for both types of GRF models.


Introduction
Statistical classification and discriminant analysis of spatial data has been mentioned in the scientific literature, but lacks full mathematical treatment and easily available algorithms and software. This paper fills this gap by proposing the method of Gaussian spatial models evaluation and comparison based on classification error rate estimators and by providing novel formulas and algorithms, which allows to evaluate the influence of spatial information to the performance of proposed classifier. The actual and expected error rates for supervised classification of Gaussian random field (GRF) observations via plug-in Bayes discriminant function (PBDF) in partial parametric uncertainty case are studied by Ducinskas (2009) and in complete parametric uncertainty case by Ducinskas and Dreiziene (2011). Given training sample, the explored classifier is obtained by substituting model parameters with their estimators in the well-known Bayes rule. Spatial discrimination based on PBDF for feature observations having elliptically contoured distributions is implemented by Batsidis and Zografos (2011). Numerical comparison of the performances for different spatial classification rules is performed by Berrett and Calder (2016). In the above mentioned papers, the main attention is paid to the so-called geostatistical models (GS) with continuous spatial index and directly specified covariance functions belonging to Matern type or other parametric models. However, many researchers intesivelly explored spatial Gaussian data observed over the lattice. Two classes of spatial linear models for lattice data are used in practice: conditional autoregressive model (CAR) and simultaneous regression model (SAR) following a neighbourhood structure on lattice. In spatial statistics literature CAR models are the most often used for the analysis of lattice/areal data since majority of authors declare that CAR models being subclass of Markov random fields are more general than SAR models (see, e.g., Sain and Cressie 2007). Recall that CAR models are a subclass of Markov random fields such that the spatial dependence is induced by conditional distributions of random errors at individual locations (sites). The practical use of Gaussian Markov random field (GMRF) (see, e.g., Rue and Held 2005) for modelling large scale spatial phenomena has significantly increased after recent advances on the efficient simulation. Without insignificant loss of generality, we restrict our attention to homogenious CAR (HCAR) lattice models (Song and De Oliveira 2012) with original parametric structure proposed by De Oliveira and Ferreira (2011). These are well-suited to the case of small samples, and ensures good frequentist properties of ML estimators of drift and scale parameters. Spatial classification based on PBDF for univariate HCAR models, imposed by the mentioned above structure, is recently explored by Ducinskas and Dreiziene (2018).
In the present paper we focus on linear classification problem of GRF observation for GS models as well as for HCAR models by using PBDF. The main theoretical objective of this study is to study the properties of two types actual error rate estimators based on previously derived analytic expressions for AER and AEER. The proposed hybrid estimators, say SD and SO, are formed by replacing theoretical parameters with their estimators in AER and AEER, respectively. This is a spatial modification of D and O estimators applied for classification problems of independent Gaussian observations (see, e.g., Lachenbruch and Mickey 1968;Snapinn and Knoke 1984;Egbo 2016). Comparison of the geostatical models for ecological data based on correct classification rates is performed by Ducinskas and Dreiziene (2019) and Dreiziene and Ducinskas (2020). This paper is organized as follows: the problem description, definitions of classifiers based on BDF are displayed in the next section; specification of the actual error rates and their estimators are presented in Section 3. Section 4 illustrates the proposed methods by simulation experiments and, finally, the conclusions are in the last section.

Main concepts and definitions
In this paper we focus on classification of spatial data that can be considered as realization of random field {Z(s) : s ∈ S ⊂ R 2 }. The universal kriging model belonging to the subclass of spatial linear models is explored. The model of observation Z(s) for s ∈ S is (Haining 1990; Cressie and Wikle 2011) where x(s) is a q × 1 vector of non-random regressors and β l is a q × 1 vector of unknown parameters, l = 1, 2, β 1 = β 2 . The error term ε(s) is zero-mean GRF {ε(s) : s ∈ S} with known covariance function σ(s, t) = cov(ε(s), ε(t)), s, t ∈ S.
In this paper we have considered geostatistical model (GS) of random fields for which spatial index s is assumed to vary continuously throughout S.
Suppose that {s i ∈ S, i = 0, 1, ..., n} is the set of spatial sites where the observations of GRF are taken. Denote the set of training sites by S n = S (1) ∪ S (2) , where S (1) = {s 1 , s 2 , ..., s n 1 } and S (2) = {s n 1 +1 , ..., s n 1 +n 2 }, n = n 1 + n 2 . Suppose that S (l) are the subsets of S n that contains n l observations of Z(s) from Ω l , l = 1, 2. The site of the observation to be classified is denoted by s 0 and will be called a focal location (see, e.g., Berrett and Calder 2016). Set S 0 n = S n ∪ {s 0 }. Put β = (β 1 , β 2 ), α 0 = Σ −1 c 0 , and denote by X l the n l ×q matrix of regressors for observations from Ω l , l = 1, 2. Then n × 2q design matrix of the training sample Z is specified by X = X 1 X 2 , where symbol denotes the direct sum of matrices and X l is the n l × q matrix of regressors for observations from Ω l , l = 1, 2.
In what follows we use the following notations for i, j = 0, ..., n: The problem considered in this paper is the following: for given training sample Z classify Z 0 into one of two described below populations. Let z denote the realization of Z. Then the conditional distribution of Z 0 given Z = z in population Ω l is Gaussian First consider GS model for spatial data. For this model we consider stationary random error case. Assume that covariance function is directly specified parametric function σ(s, t) = σ 2 r(s − t), where r(•) is the spatial correlation function and σ 2 = σ ii , i = 0, ..., n.
Under the assumption of complete parametric certainty, the Bayes discriminant function (BDF) minimizing the probability of misclassification (PMC) is formed by log ratio of conditional likelihoods.
In the present paper we use ML estimator of regression parameters: and bias adjusted ML estimator of scale parameter By replacing the parameters with their estimators in (5) we form PBDF with F + = (I q , I q ) and F − = (I q , −I q ), where I q denotes the identity matrix of order q.
The expectation of the actual error rate with respect to the distribution of Z is called the expected error rate (EER). The EER is useful in providing a guide to the performance of the plug-in classification rule before it is actually formed from the training sample. It can be considered as the performance measure to the PBDF similar as the mean squared prediction error (MSPE) is the performance measure to the plug-in kriging predictor (see, Diggle, Ribeiro, and Christensen 2003). These facts strengthen the motivation for deriving the AEER associated with PBDF.
AER is a function of discriminant function, but the distribution of PBDF based on unknown parameters is quite complicated and thus an analytical expressions for the error rates becomes difficult. Therefore AER should be estimated by different error rate estimators, which will be explained below. One type of AER estimators is based on approximation of AER (see, Ducinskas 2009).
In the present paper we use AEER that will be derived in Lemma 2.
Lemma 2. The approximation of EER for PBDF W z (Z 0 ,Ψ) is Proof. The proof of Lemma 2 is based on the second order Taylor expansion of P (Ψ) about the pointΨ = Ψ, and is implemented by mimicking the proof of the Theorem from Ducinskas paper (2009).
Here we propose spatial modification of the widely used error rate estimators D and O and denote them by SD and SO. Note, that each of the methods of error rate estimation described in this section is given a symbol to identify it. The estimators are referred to by symbol as a superscript.
For effectiveness, criteria of these estimators is based on the unconditional mean squared error (UMSE) i.e.

Simulation experiments
In order to illustrate the results of previous section, a numerical example is considered. Comparison of the proposed AER estimators with respect to the minimum of UMSE for GS and HCAR models is demonstrated. We consider the empirical estimators of the error rates incurred by the rule based on the proposed PBDF for stationary GRF.
Assume that data are sampled on the 11 × 11 regular unit spacing lattice with the focal location in the centre of the lattice (see Figure 1). Equal-sized training samples with equal prior probabilities are considered, that is n 1 = n 2 = 60 and π 0 1 = π 0 2 = 1/2. Spatial correlation for GS case is modelled by isotropic exponential covariance function given by σ(h) = σ 2 exp(−h/α). 100 replications (M = 100) were performed with true parameters σ 2 = 1, α = 2 using geoR, package included into the R project for statistical computing. For HCAR case the simulations were performed according to the algorithm based on Cholesky factorisation proposed by Rue and Held (2005). Spatial weights w ij typically reflect the spatial influence of observation from site s i on observation from site s j . Here we use power distance weights of the form w ij = d −2 ij , where d ij refers to the Euclidean distance between sites s i and s j . The values of the proposed actual error rate estimators (4), (5) and their UMSE (6) for both GRF types are presented in Table 1 and Table 2. The magnitude of UMSE for any given value of Mahalanobis distance ∆ 0 show the advantage of SO estimator against SD estimator since ∆U M SE = U M SE SD − U M SE SO > 0. The tables show that the effect of separation level between populations is evident, i.e., estimators and their UMSE decreases as ∆ 0 increases.

Conclusions
For the spatial data modelled by two types of GRF the novel actual error rate estimators for classifiers based on PBDF are proposed and explored. Performances of the proposed estimators are evaluated by UMSE criterion via simulation experiments.
The simulation study with a moderate size of training samples shows that for both types of GRF models SO estimator has an advantage against SD estimator with respect to the minimum UMSE criterion.
Hence, adding the supplementary term to theP SD slightly improves the effectiveness of estimator. That fact confirms the usefulness of deriving the expected error rate approximations for classifiers of spatial data modelled by GRF.