A Global Bayes Factor for Observations on an Infinite-Dimensional Hilbert Space, Applied to Signal Detection in fMRI

Functional Magnetic Resonance Imaging (fMRI) is a fundamental tool in advancing our understanding of the brain’s functionality. Recently, a series of Bayesian approaches have been suggested to test for the voxel activation in different brain regions. In this paper, we propose a novel definition for the global Bayes factor to test for activation using the Radon-Nikodym derivative. Our proposed method extends the definition of Bayes factor to an infinite dimensional Hilbert space. Using this extended definition, a Bayesian testing procedure is introduced for signal detection in noisy images when both signal and noise are considered as an element of an infinite dimensional Hilbert space. This new approach is illustrated through a real data analysis to find activated areas of Brain in an fMRI data.


Introduction
Functional Magnetic Resonance Imaging (fMRI) measures the metabolic changes that occur within the brain in response to a stimulus. It is a fundamental tool in advancing our understanding of human brain functionality. FMRI is a sequence of magnetic resonance images (MRI). Each image contains a number of uniformly spaced volume elements, called pixels or voxels given the dimension of the image, that partition the brain into equal size boxes. The intensity of each pixel or voxel represents the spatial distribution of the nuclear spin density in that area. Changes in brain hemodynamics are due to the neuronal activity. These changes impact the local intensity of the MR signal. Therefore, changes in the pixel or voxel intensity across time can be used to infer when and where a particular activity happened. In last two decades, there have been many studies for finding activation in brain images obtained from fMRI. Lazar (2008) provided a comprehensive summary on statistical analysis of fMRI data. However, the drawback of traditional methods is the discretization of a continuous image in their analysis. Furthermore, these approaches make inference on a continuous object using tools that is originally introduced for discrete data. In current work, we propose a method that directly works with continuous subjects by assuming them to be infinite dimensional random variables.
A common statistical inference approach for analytic (continuous) statistical processes is based on the random fields theory. Level crossings (Friston, Frith, Liddle, and Frackowiak 1991;Friston, Holmes, Worsley, Poline, Frith, and Frackowiak 1994) and differential topology (Worsley, Evans, Marrett, and Neelin 1992) are the two examples of such approach. Recently, a series of Bayesian approaches have been suggested to test for brain activation (Penny and Friston 2003;Flandin and Penny 2007). These detection methods are limited to cases where the signal is observed for at most countable points and the alternative is also a simple hypothesis test. Common Bayesian non-decision theoretic tools for hypotheses testing are posterior probability and Bayes factors (Poor 2013). Makni, Idier, Vincent, Thirion, Dehaene-Lambertz, and Ciuciu (2008) developed a Bayesian framework to conciliate both detection and estimation issues. Kim, Smyth, and Stern (2006) used Bayesian nonparametric methods to automatically select the number of activated clusters in an image. Rohani, Shafie, and Noorbaloochi (2006) proposed a Bayesian procedure to detect signals existing within noisy images when the image is modeled as a scale space random field. In this paper, we extend the scale space results to a general setting using the Radon-Nikodym derivative. Furthermore, we extend the definition of the Bayes factor for testing the point null hypothesis. Using this extended definition, we introduce a Bayesian testing procedure for signal detection in noisy images when both signal and noise are considered as an element of an infinite dimensional Hilbert space. Our proposed method avoids errors that can happen in the discretization process and provides a more informative result.
The rest of the paper is organized as follows. Section 2 briefly reviews the definition of the Bayes factor for testing a sharp null hypothesis against its complement for infinite dimensional abstract spaces. Section 3 briefly reviews the classical results of Gaussian measures on Hilbert spaces, the equivalence and orthogonality of measures, and the derivation of the Radon-Nikodym derivatives. Section 4 applies the Bayes factor to Gaussian modeling of images as Hilbert valued random objects and the Bayesian method of signal detection is explained in detail. We applied our proposed method to two models in Section 5. Finally, Section 6 concludes the paper with a short discussion.

Bayesian testing of a sharp (point) null hypothesis
The common sharp (point) hypothesis testing conducts the test H 0 : θ = θ 0 against H 1 : θ = θ 0 (Lee 1997). Bayesian testing procedure for this procedure for a finite dimensions setting starts with a random vector, X, with density f (x|θ). The unobserved conditioning parameter θ is an unknown element of the parameter space Θ which is a subset of R n . Using the two stage Jeffreys prior (Jeffreys 1998), we let π 0 = P (H 0 ) be the prior probability assigned to H 0 and definitionine the prior probability over Θ to be where g(θ) is a conditional prior probability over the alternative values. The Bayes factor for assessing a null hypothesis H 0 versus an alternative H 1 is definitionined as Let g(θ) to be a continuous density on {θ ∈ Θ : θ = θ 0 } or Θ. Thus, for sharp hypothesis testing, the Bayes factor in equation (1) is equivalent to where m g (x) = f (x|θ)g(θ)dθ.
The Bayes factor definitionined in (2) combines prior information with the sample information in the likelihood. In other words, it can be considered as a weighted likelihood ratio of H 0 to H 1 . The weight function for a Bayes factor is the prior density, g, for the unknown parameter, θ. Thus, a Bayes factor offers a way of evaluating evidence against a null hypothesis. The interpretation of Bayes factors is similar to the usual likelihood ratio. For instance, if we observe a value of B = 1 10 means that H 1 is supported ten times as much by the data as is H 0 . In addition to a Bayes factor, the posterior probability of H 0 is also a relevant measure that can be used for the same purpose. In current setup, it is easy to see that the posterior probability is Following Grenander (1981), we extend the finite dimensional Bayes factor in equation (2) to a more general setting where the parameter and the sample spaces are infinite dimensions. Let (Ω, F) be a probability space, X be a random object (measurable mapping) from (Ω, F) to (X , F 1 ), Θ be an abstract parameter space, and F 2 be a σ-algebra of subsets of Θ. Suppose G is a probability measure on (Θ, F 2 ) and for each θ ∈ Θ, P θ is the probability measure induced by X on (X , F 1 ). In addition, for each A ∈ F 1 , let P θ (A) ∈ F 2 be a measurable function of θ.
. For a given θ 0 ∈ Θ, consider the measures P θ 0 and P G . According to the Lebesgue decomposition theorem, there exists a set A 0 ∈ F 1 with P G (A 0 ) = 0 and a nonnegative function f integrable with respect to P G such that for any A ∈ F 1 we have The function f is the Radon-Nikodym (R-N) derivative of P θ 0 with respect to P G and denoted by In this setting, for testing H 0 : θ = θ 0 versus H 1 : θ = θ 0 , we definitionine the Bayes factor to be Two problems that arise with this definition are how to obtain A 0 and how to calculate f . The set A 0 has either P θ 0 (A 0 ) = 1 or P θ 0 (A 0 ) = 0. If P θ 0 (A 0 ) = 1, the two measures P 0 and P G are orthogonal and a perfect decision about H 0 can be made. If P θ 0 and P G are equivalent then P θ 0 (A 0 ) = 0 and f = 0 with P G probability of 1. Grenander (1981) provides a simultaneous characterization of A 0 and f for cases where the observation x can be represented as a countable number of variates. In Section 4, we applied this notion to a common signal detection problem where a Gaussian measure model is used for both the observation and the conditional prior measure over some Hilbert spaces. In Section 3 we provide the model and preliminary results for our approach.

Gaussian measure on a Hilbert space
We are interested in characterizing the Bayes factor when observations are Hilbert valued random objects. Let H be a real separable Hilbert space with inner product operator < ·, · >, B be the class of all Borel subsets of H, and H * be the dual space of H. The characteristic function of a measure, m, on (H, B) for h ∈ H is Assuming m to be a probability measure on H, then ϕ is a continuous positive definitioninite functional on H and enjoys almost all properties of characteristic functions on Euclidean spaces. Varadhan (1968), Kuo (1975), and Bogachev (1998) provided a detailed review for the measures on Hilbert space. We need the following definitions and theorem for developing the method proposed in Section 4.
Definition 1. If m is a measure on H, then the mean function, µ, is an element of H definitionined by Definition 2. If m is a measure on H, then the covariance operator is a bilinear function on H definitionined by The covariance is a symmetric positive definitioninite bilinear form and the operator S definitionining the bilinear form is a positive semidefinitioninite Hermitian operator.
A Gaussian probability measure P on H is a measure such that its characteristic function has the form where µ and S are the mean and covariance of P , respectively. Following, we present the Theorems and Lemmas on the equivalence of two Gaussian measures on a Hilbert space which are needed for further developments. Proofs of these statements can be found in Varadhan (1968).
Theorem 1. Two Gaussian measures definitionined on a Hilbert space are either orthogonal or equivalent.
Suppose P 1 and P 2 are two Gaussian measures on H with means µ 1 and µ 2 and non-singular covariance operators S 1 and S 2 , respectively.
Theorem 5. Suppose P 1 = HN (µ, S 1 ) and P 2 = HN (µ, S 2 ) are equivalent. Then, the Radon-Nikodym derivative of P 1 with respect to P 2 denoted by f (x) is where, for i = 1, 2, · · · , ∞, µ i is the mean value of Z i (x) given P 1 . In equation (5), where, for j = 1, 2, · · · ∞, u j (x) =< e j , x > / √ s j with e j and s j denoting the eigenfunctions and their corresponding eigenvalues of S 1 , respectively. Furthermore, λ j and g j are the eigenfunctions and eigenvalues of T , respectively, where T is such that S 2 = S 1/2 1 T S 1/2 1 , and g j = ∞ k=1 a jk e k .

Signal detection
Let (T , F, m) be a measurable space and L 2 (T , m) be a Hilbert space of squared integrable real functions on T . Let X = Θ = L 2 (T , m) and F 1 and F 2 be the Borel sigma field of L 2 (T, m). For each θ ∈ Θ, let P θ be a Gaussian measure with mean θ and covariance operator, ρ. In addition, let G, the prior measure on Θ, to be a Gaussian measure with known mean µ ∈ Θ and covariance operator, τ . Using the characteristic function approach, it can be easily proved that the marginal measure of X, P G , is also a Gaussian measure with the mean µ and covariance operator ρ + τ .
For Bayesian hypothesis testing of H 0 : θ = 0 versus H 1 : θ = 0, we should know if P 0 and P G are equivalent or orthogonal. If they are orthogonal, then a perfect decision can be made between H 0 and H 1 . Otherwise, we need to derive the R-N derivative of P 0 with respect to P G . The following result provides a sufficient condition.
Proof. Suppose P 0 and P G are equivalent. Let G = HN (0, τ ), then according to Lemma 1 and Theorem 1 P 0 and P G are equivalent. From Lemma 2, there exists an invertible bounded Hermitian operator T such that (T − I) is of Hilbert-Schmidt class and ρ + τ = ρ 1/2 T ρ 1/2 , On the other hand, since P 0 and G are equivalent, there exists an invertible bounded Hermitian operator D such that (D − I) is of Hilbert-Schmidt class and τ = ρ 1/2 Dρ 1/2 Comparing (6) and (7) shows both operators T − I and T − 2I are of Hilbert-Schmidt class which is a contradiction. The following example shows that the orthogonality of P 0 and G does not yield the equivalence of P 0 and P G .
Example. Let P 0 = HN (0, ρ) and G = HN (µ, ρ) where µ / ∈ R(ρ 1/2 ). In this case, according to Theorem 2, P 0 and G are orthogonal. If P 0 and P G = HN (µ, 2ρ) are equivalent, then by Theorem 3, so are P 0 = HN (0, ρ) and G = HN (µ, ρ), which is a contradiction. This example can be generalized for any P 0 = HN (0, ρ) and G = HN (µ, τ ), where µ / ∈ R(ρ 1/2 ). In the following theorem, we consider a special case where P 0 and P G are equivalent and the R-N derivative has a simple form of the exponential of a diagonal form. A general form of the theorem is proved in Varberg (1967) and Chung and Rajput (1981). Here, we give a direct proof for an special case.
Theorem 7. Let H = L 2 (T , m), where T ⊂ R n is a compact set and m is the Lebesgue measure. In addition, let P 0 = HN (0, ρ) and G = HN (0, τ ), where ρ is continuous on , with {λ i } and {φ i } are eigenvalues and the corresponding eigenfunctions of the operator induced by ρ, respectively, and λ > 0 is such that for i = 1, 2, · · · , 1 − λλ i > 0. Then, P 0 and P G are equivalent and where is bounded and therefore the series in the definition of τ is absolutely convergent. In addition, ρ is continuous, so using the Mercer's theorem, we have Let γ ij = (1 − λλ i )δ ij where δ is the Kroneker delta. Then, the operator Γ definitionined by [γ ij ] ∞ i,j=1 is invertible, bounded, and since the (Γ − I) is a Hilbert-Schmidt operator. It can be easily proved that ρ = r 1/2 Γr 1/2 . Hence, by Lemma 2, P 0 and P G are equivalent.
To obtain the R-N derivative, note that Γ and r have the same eigenfunctions φ i and their corresponding eigenvalues are {1 − λλ i } and { λ i 1−λλ i }, respectively. Therefore, correspond to a ij , g i , s i , e i , and λ i in Theorem 5 are 1, φ i , λ i /(1 − λλ i ), φ i , and {1 − λλ i }, respectively. Hence, If we substitute all these in (5), we get the result.
Using Theorem 4, we have the following for the case µ = 0.

Application to fMRI data
In this section, we apply the proposed methodology on a fMRI scan of the human brain. Note that the main purpose of this data analysis is to show that proposed approach can be used for fMRI data analysis. One of the common object in using fMRI is to locate the regions of the brain that respond to a simple visual stimulus (Kwong, Belliveau, Chesler, Goldberg, Weisskoff, Poncelet, Kennedy, Hoppel, Cohen, and Turner 1992). In an experiment at the Montreal Neurological Institute, a subject was given a simple visual stimulus, flashing red dots, presented through light-tight goggles (Ouyang, Pike, and Evans 1994). The stimulus was switched off for 4 scans, then on for 4 scans. This procedure was repeated 5 times which resulted in total of 40 scans. The time interval between pair of scans was 6 seconds and the stimulus period was T = 48 seconds. Hence, the data set consists of a time series of 40 2-D images, each 128 × 128 pixels of size 2 mm.
This data set has been analyzed by several researchers using the χ 2 scale space method (Worsley 2001), rotation space random field method (Shafie, Sigal, Siegmund, and Worsley 2003), and Bayesian method (Rohani et al. 2006). To apply the random field methodology, these researchers fitted a sine wave at each fixed pixel and extracted two images. These images were referred to as sine and cosine components of the data. The cosine component is shown in Figure 1. Here, we apply the proposed Bayes factor method to this image to find out if there is any sign of activation caused by the stimulus.
The Matched Filter Theorem justifies smoothing the image before any analysis with a filter of the form of signal. The smoothing assumption is essential for methods proposed by Worsley (2001) and Shafie et al. (2003). In order to get a smooth random field, the image is smoothed before analyzing data with a Gaussian filter. However, in practice the shape of the signal is not known. Therefore, smoothing with a kernel is not quite justifiable. Note that in our approach, we do not need this smoothing procedure.
We will consider the following two models: where T = [0, 1] × [0, 1], θ ∈ L 2 (T , m), and Z is Gaussian white noise in T . Under these assumptions, P 0 = HN (0, ρ) with ρ((t 1 , t 2 ), (s 1 , s 2 )) = 1 t 1 = s 1 , t 2 = s 2 0 otherwise This model is the same as the one used by Siegmund and Worsley (1995). The main difference is that here, the image is not pre-smoothed for the analysis. The model does not fit the implicit assumptions in section 4 as white noise cannot be realized in the Hilbert space. However, the theory can be extended to cover this case. For further information, see Balakrishnan (2012) and Adler and Taylor (2009).
Suppose θ follows a HN (µ, τ ) where µ(t) 2 dt < ∞, then P 0 = HN (0, I) and P G = HN (µ, τ + I). It is clear that P 0 and P G are equivalent and we have where for i = 1, 2, . . . , ∞, Z i =< x, e i >, µ i =< µ, e i >, and e i s are the eigenvalues and corresponding eigenfunctions of τ , respectively. For the one dimensional case, f (x) is similar to the criterion used for the detection of random Gaussian signal in Gaussian noise in signal processing literature. For further information, see Poor (2013). For a specific model, consider a Brownian sheet prior with mean 0 and the following covariance function τ ((t 1 , t 2 ), (s 1 , s 2 )) = min(t 1 , s 1 ) min(t 2 , s 2 ).
Since the assumption of white noise may not be very realistic, we consider the following model.
This means that for the values of 0 < λ < 81/16, having HN (0, τ ) as a prior for θ, we fail to reject the null hypothesis of no signal.

Conclusion
Research has greatly improved our understanding of the human brain. One of the common tools in brain image analysis is fMRI. However, relatively few studies have looked at this imaging technique as giving an observation as an infinite dimensional Hilbert random variable.
In this paper, we extended the definition of the Bayes factor to a more general setting where the variables of interest can be members of an infinite dimensional Hilbert space. In this paper, our focus is to provide inferential tools for analyzing images obtained through fMRI techniques, but the approach might be used for analyzing other kinds of images or other kinds of data that can be expressed in the form of an image.
For the real data analysis, the Bayes factors for Models I and II find a small sign of a signal in Figure 1, which appears to show a signal. The Bayes factors are 1.0045 for Model I, and between 1 and about 0.70 for Model II. This would seem to suggest that this approach has low power, at least for the cosine component of this dataset. There are three possible explanations for why this might have happened.
i The power might be greater if the time series for each voxel was reduced to a 2-D image using something other than the cosine component.
ii In practice, to increase the signal-to-noise ratio, images are first smoothed using a smooth kernel. The models we have considered here, models I and II, are missing this feature.
iii The original fMRI dataset consists of a time series of images. In our models, the time series of the images was reduced to a 2-D image. If we consider the time series of the images, as a time series of Hilbert valued variables, we might have better detected the signal.
The last two explanations are the topic of our current research.