Kriging and Prediction of Nonlinear Functionals

The prediction of a nonlinear functional of a random field is studied. The covariance-matching constrained kriging is considered. It is proved that the optimization problem induced by it always has a solution. The proof is constructive and it provides an algorithm to find the optimal solution. Using simulation, this algorithm is compared with the method given in Aldworth and Cressie (2003).


The Nonlinear Prediction Problem
A random field z(t), t ∈ D ⊂ R d , of the form ( 1) is observed at locations t 1 , . . ., t n .Kriging means optimal predicting the unobserved values of the random field z.The method of kriging is well-known (see e.g.Krige (1951), Cressie (1991)).Several modifications of the kriging procedure are studied, e.g., in Fazekas and Kukush (2003) a version of kriging is analyzed when the locations where the field is observed are not known precisely.
In this paper, we study problems which require the prediction of nonlinear functionals of the field z.To solve this kind of problems, Aldworth and Cressie (2003) introduced a predictor, called covariance-matching constrained kriging (see also Cressie (1993)).It is an optimal linear predictor that matches not only first moments but second moments (including covariances) as well.Aldworth and Cressie (2003) proved some properties of the covariance-matching constrained kriging and applied it to real life problems.Here we prove some further properties of that predictor.
We want to predict g(s), where g is a known scalar-valued function that is nonlinear in its vector-valued argument s.Here s = (s(B 1 ), . . ., s(B m )) is a function of the underlying random field z.To predict g(s), consider a predictor of the form g( s), where s = A z is a linear predictor of s, and A is an n×m matrix to be calculated.To obtain an approximately unbiased predictor, Aldworth and Cressie (2003) used Taylor's expansion up to second order for g(A z) and g(s).Their approach gave the following constraints.
Assume that A satisfies the constraints EA z = Es and var(A z) = var(s).The first is the standard unbiasedness condition.If s and z are Gaussian, then the above conditions imply that A z and s have the same distribution.Therefore in the Gaussian case g(A z) is an unbiased predictor of g(s).
As the L 2 error, i.e., E(g(A z) − g(s)) 2 can not be calculated directly, Aldworth and Cressie (2003) used a first order Taylor's expansion to approximate it.They obtained the expression y var(A z − s)y, where y = g (µ m ) = ∂g(µ m )/∂x and µ m = Es.
To obtain a predictor that can be calculated numerically, we assume that s satisfies the following.(These assumptions are realistic if s is a linear function of z, e.g. if it is defined by an integral of z.) Es = µ m = X m β, where X m is a known (p + 1) × m type matrix.Moreover, var(s) = Σ m , cov(z, s) = C, where Σ m is a known m × m type matrix, while C is a known n × m type matrix.So we obtain the following objective function and constraints.
Find the matrix A such that the objective function where y = g (µ m ).Aldworth and Cressie (2003) gave a matrix A 0 that maximizes the Lagrangian function corresponding to the maximum problem (2)-( 4) at m linearly independent directions of y.However, the original problem is to find the maximum for y = g (µ m ).Here we shall prove (see Theorem 1) that, under realistic conditions, the maximum problem (2)-(4) has a solution whatever y may be.Actually, we construct the optimal solution.This construction serves the base of a numerical method what we call algorithm M. Using computer simulation, we shall compare algorithm M with the naive estimator and the method suggested by Aldworth and Cressie (2003).

The Solution of the Optimization Problem
Theorem 1.Let Σ be a real symmetric positive definite matrix of type n × n, Σ m be a real symmetric positive definite matrix of type m × m, C be a real matrix of type n × m, X m be a real matrix of type (p + 1) × m, X be a real matrix of type n × (p + 1) with rank (5) (a) Then there exists a matrix A satisfying conditions ( 3) and ( 4) if and only if P is positive semidefinite.

Proof. Introduce the following notation
Then constraints (3)-( 4) are equivalent to the following.Find the matrix E satisfying ) where I is the m × m type unit matrix, B and Y are known matrices.(a1) First consider the case m = 1.Then constraints ( 6)-( 7) are equivalent to where Y is a given matrix of type (p + 1) × n with rank p + 1, b is a given vector of dimension p + 1, while we are looking for the unknown n-dimensional vector e.
By the assumptions of the theorem, n > p + 1.Then the general solution of Y e = b is a non-trivial linear manifold.The particular solution e 0 to Y e = b having the minimal length, is the one which is orthogonal to each solution of the homogeneous linear equation Y e = 0. Therefore, if e is orthogonal to the rows of Y , then e 0 is orthogonal to e.So e 0 is in the subspace spanned by the rows of Any solution of Y e = b can be written in the form e = e 0 + e 1 , where e 1 is a vector with Y e 1 = 0.As e 1 is orthogonal to e 0 , our condition is 1 = e e = e 0 2 + e 1 2 .This can be satisfied if and only if e 0 2 ≤ 1.So we obtained the following statement.
This is equivalent to the condition that the matrix P defined in ( 5) is positive semidefinite.So we obtained part (a) in the case m = 1.
(a2) Now let m be an arbitrary positive integer.If there exists an E satisfying ( 6)-( 7), then for any a ∈ R p+1 , a = 0, we have So e = Ea/ a is a solution to (8) with b = Ba/ a .By the previous part of the proof, this implies that ( 9) is satisfied, i.e.
is positive semidefinite.Therefore the matrix P defined in ( 5) is positive semidefinite.Now let P be positive semidefinite.Find the solution of ( 6)-( 7) in the form where V is a matrix to be determined.We remark that in Aldworth and Cressie (2003) the same scheme was applied.They obtained that is the optimal solution at m linearly independent directions of y. Here is the orthogonal projection to the orthogonal complement of the subspace generated by the columns of Y , while So we obtain that ( 6) is satisfied.Now, by ( 7), we have to solve This equation is equivalent to Here the n × n type matrix M is an orthogonal projection to a subspace of dimension n − (p + 1) (therefore M is symmetric, idempotent and positive semidefinite).
it is symmetric and positive semidefinite (because P is positive semidefinite).Now we shall find a sulution V of (11) using the assumption n − (p + 1) ≥ m.Let H be an othogonal matrix (containing orthonormed eigenvectors of M ) such that where at the right hand side a diagonal matrix stands in which the first n−(p+1) elements in the diagonal are ones while the remaining elements are zeros.Let V = (N 1/2 , O)H , where O is a matrix containing zero elements.Then

A Construction of the Solution of Problem (2)-(4)
Here we give a constructive proof for part (b) of Theorem 1.An algorithm will be based on this construction.
(b1) First we give a simple proof of part (b) in the case m = 1.Now the objective function y A Cy is of the form w e, where w = y 2 Σ 1/2 m C Σ −1/2 (as m = 1, y and Σ m are scalars).Here e is to be determined.We can assume that w = 0 (if w = 0 then the maximum of the objective function is always 0).By the above calculations, if there is a vector satisfying the constraints, then it is of the form e = e 0 + e 1 , where e 0 is uniquely determined, its norm is not greater than 1, e 1 is not uniquely determined, but it is the solution of Y e 1 = 0 and e 1 2 = 1 − e 0 2 .Let w = w 0 + w 1 , where w 0 is in the subspace spanned by the rows of Y and w 1 is orthogonal to the rows of Y .Then w e = w 0 e 0 + w 1 e 1 is maximal if and only if e 1 is parallel to w 1 (use the Cauchy-Schwarz-Buniakovskii inequality).So we obtained part (b) of Theorem 1 in the case m = 1.
(b2) Now turn to part (b) in the case of arbitrary m.The proof is the basis of algorithm M.
We slightly reshape constraints ( 6)-( 7).As the p+1 columns of Y are independent, we can apply Gram-Schmidt orthogonalization to it.This can be described by an invertible matrix G.
Here w 1 , . . ., w p+1 ∈ R n are given othonormed vectors, v ij , i = 1, . . ., m, j = 1, . . ., p + 1 are given numbers.We have to find e 1 , . . ., e m ∈ R n .Let L denote the subspace in R n spanned by w 1 , . . ., w p+1 and L ⊥ its orthogonal complement.We search e 1 , . . ., e m in the form where l 1 , . . ., l m are unit (in some special cases zero) vectors from L ⊥ .(As w 1 , . . ., w p+1 are othonormed vectors, the above representation is valid for vectors e 1 , . . ., e m satisfying (12).)Because we assume that the constraints ( 6)-( 7) can be satisfied, therefore there exist orthonormed vectors e 1 , . . ., e m ∈ R n with the above representation.This implies that p+1 j=1 v 2 ij ≤ 1 for i = 1, . . ., m, and By orthogonality of e i and e k , So we have to find the vectors l 1 , . . ., l m from L ⊥ such that where The objective function to be maximized is where As the first summand here is fixed, we have to find the maximum of Finally, introducing the notation f ⊥ for the orthogonal projection of f to L ⊥ , and choosing γ i = x i α i , i = 1, . . ., m, we have to find the vectors l 1 , . . ., l m from L ⊥ such that ( 15) is satisfied and is maximized.By the Cauchy-Schwarz-Buniakovskii inequality, with λ > 0. So we shall find l 1 , . . ., l m from L ⊥ such that ( 15) and ( 21) are satisfied.We see that Fix an orthonormed basis h 1 , . . ., h n−(p+1) in L ⊥ so that f ⊥ is in the subspace spanned by the first m basis vectors.From now on we identify the vectors in L ⊥ with their coordinate sequences according to the basis h 1 , . . ., h n−(p+1) .Find l 1 , . . ., l m in the form where D 11 is of type m × m and D 22 is of type , 0 , where U is an m × m type orthogonal matrix.Then We have to find U so that (21) is satisfied.This gives where h i is the ith unit vector in R m , while f ⊥ consists of the first m coordinates of f ⊥ .As λ is chosen such that the vectors D 1/2 0 m i=1 γ i h i and λ f ⊥ have the same length, one can find an orthogonal matrix U satisfying (22).

Numerical Results
In this section we compare three methods of prediction of the nonlinear functional g.
The first one is algorithm M described in the constructive proof of Theorem 1 in Section 3. To obtain a feasible method, instead of y we use y when we solve the optimization problem (2)-( 4).Here where β = (X Σ −1 X) −1 X Σ −1 z is the generalized least squares estimator of β.The resulting estimator is g opt = g(A opt z), where A opt is the solution given by algorithm M if y in ( 2) is substituted by y given in ( 23).
The second estimator is the one given by Aldworth and Cressie (2003).This estimator is denoted by g ac .Here g ac = g(A 0 z), where where is the matrix in (5), moreover Here is the universal kriging predictor.Finally, the third estimator is the naive estimator.for t = (t 1 , t 2 ) ∈ R 2 , where t 2 = t 2 1 + t 2 2 .(We assume that the covariance structure is completely known.Actually δ 0 is an Ornstein-Uhlenbeck type random field.For the Ornstein-Uhlenbeck random field see Terdik and Woyczynski (2005) and the references therein.)The random variables δ 1 (x, y), where (x, y) ∈ D, are independent and uniformly So we obtained part (a) for arbitrary m.(b) Now we turn to part (b).Constraints (3)-(4) define a bounded closed set of matrices A. The objective function (2) is a linear function of A. As a continuous function attains its maximum on a compact set therefore part (b) is proved.

Table 1 :
The relative errors of the estimators