Maximum likelihood drift estimation for Gaussian process with stationary increments

The paper deals with the regression model $X_t = \theta t + B_t$, $t\in[0, T ]$, where $B=\{B_t, t\geq 0\}$ is a centered Gaussian process with stationary increments. We study the estimation of the unknown parameter $\theta$ and establish the formula for the likelihood function in terms of a solution to an integral equation. Then we find the maximum likelihood estimator and prove its strong consistency. The results obtained generalize the known results for fractional and mixed fractional Brownian motion.


Introduction
We study the problem of the drift parameter estimation for the stochastic process where θ ∈ R is an unknown parameter, and B = {B t , t ≥ 0} is a centered Gaussian process with stationary increments, B 0 = 0. In the particular case when B = B H is a fractional Brownian motion, this model has been studied by many authors. Mention the paper Norros et al. (1999) that treats the maximum likelihood estimation by continuous observations of the trajectory of X on the interval [0, T ] (see also Le Breton (1998)). Further, the paper Hu et al. (2011) investigates the exact maximum likelihood estimator by discrete observations at the points tk = kh, k = 1, 2, . . . , N ; the paper Bertin et al. (2011) considers the maximum likelihood estimation in the discrete scheme of observations, where the trajectory of X is observed at the points t k = k N , k = 1, 2, . . . , N α , α > 1. For hypothesis testing of the drift parameter sign in the model (1) driven by a fractional Brownian motion, see Stiburek (2017). The paper Cai et al. (2016) treats the likelihood function for Gaussian processes not necessarily having stationary increments. However, on the one hand, our approach is different from their one, it cannot be deduced from their general formulas and on the other hand, gives rather elegant representations. The construction of the maximum likelihood estimator in the case when B is the sum of two fractional Brownian motions was studied in Mishura (2016) and Mishura and Voronov (2015). A similar non-Gaussian model driven by the Rosenblatt process was considered in Bertin et al. (2011).
As already mentioned, we consider the case when B is a centered Gaussian processes with stationary increments. We construct the maximum likelihood estimators for both discrete and continuous schemes of observations. The assumptions on the process in the continuous case are formulated in terms of the second derivative of its covariance function, see Assumptions 1 and 2. The exact formula for the maximum likelihood estimator contains a solution of an integral equation with the kernel obtained after the differentiation. We give the sufficient conditions for the strong consistency of the estimators. Several examples of the process B are considered.
The paper is organized as follows. Section 2 is devoted to the case of the discrete observations. The maximum likelihood estimation for continuous time is studied in Section 3.

Maximum likelihood estimation by discrete observations
We start with the construction of the likelihood function and the maximum likelihood estimator in the case of discrete observations. In the next section these results will be used for the derivation of the likelihood function in the continuoustime case, see the proof of Theorem 3.3.
Let the process X defined by formula (1) be observed at the points t k , k = 0, 1, . . . , N , The problem is to estimate the parameter θ by the observations X t k , k = 0, 1, . . . , N of the process X t .
2.1. Likelihood function and construction of the estimator. Denote Note that in our model X t0 = X 0 = 0, and the N -dimensional vector ∆X (N ) is a one-to-one function of the observations. The vectors ∆B (N ) and ∆X (N ) are Gaussian with different means (except the case θ = 0) and the same covariance matrix. We denote this covariance matrix by Γ (N ) . The next maximum likelihood estimator coincides with the least square estimator considered in Rao (2002, eq. (4a.1.5)).
Lemma 2.1. Assume that the Gaussian distribution of the vector (B t k ) N k=1 is nonsingular. Then one can take the function where z = (t k − t k−1 ) N k=1 , as a likelihood function in the discrete-time model. MLE is linear with respect to the observations and equalŝ Proof. The pdf of ∆B (N ) with respect to the Lebesgue measure equals The density of the observations for given θ with respect to the distribution of the observations for θ = 0 is taken as a likelihood function.
Remark 2.2. Let the process X be observed on a regular grid, i.e., at the points t k = kh, k = 1, . . . , N , where h > 0. Then Γ (N ) is a Toeplitz matrix, that is does not depend on l due to the stationarity of increments. This simplifies the numerical computation of MLE.

2.2.
Properties of the estimator. Since ∆X (N ) = ∆B (N ) + θz, the maximum likelihood estimator (3) equalŝ Lemma 2.3. Under assumptions of Lemma 2.1, the estimatorθ (N ) is unbiased and normally distributed. Its variance equals Proof. The estimatorθ (N ) is unbiased and normally distributed becauseθ (N ) − θ is linear and centered Gaussian vector ∆B (N ) . The variance of the estimator is equal to To prove the consistency of the estimator, we need the following technical result.
Proof. As the matrix A is positive definite, x ⊤ Ax > 0 and there exists positive definite matrix A 1/2 (and so the matrix A 1/2 is symmetric and nonsingular) such that (A 1/2 ) 2 = A. By the Cauchy-Schwarz inequality, whence the desired inequality follows.
In the rest of this section we assume that the process X is observed on a regular grid, at the points t k = kh, k = 1, . . . , N , for some h > 0. We also assume that for any N the Gaussian distribution of the vector (B kh ) N k=1 is nonsingular.
Letθ (N ) be the ML estimator of parameter θ of the model (1) by the observations X kh , k = 1, . . . , N . Then the estimatorθ (N ) is mean-square consistent, i.e., Proof. By Remark 2.2, the matrix Γ (N ) has Toeplitz structure, k,l does not depend on N as soon as N ≥ max(k, l). By Toeplitz theorem, For regular grid we have that z = (h, . . . , h) ⊤ . Hence, in this case, Finally, with use of Lemma 2.3, To prove the strong consistency, we need the following auxiliary statement.
Proof. In the next paragraph, N 2 ≤ N 3 are positive integers, is N 2 × N 3 diagonal matrix with ones on the diagonal, and its transpose is denoted by Thus, the Gaussian process {θ (N ) , N = 1, 2, . . .} is proved to have uncorrelated increments. Hence its increments are independent.
Proof. By Theorem 2.5 varθ (N ) → 0 as N → ∞, so as N → ∞. The processθ (N ) has independent increments. Therefore by Kolmogorov's inequality, for ǫ > 0 and N ∈ N Then, using the unbiasedness of the estimator, we get These processes have stationary increments, and see e. g. (Mishura, 2008, Sec. 1.2). Taking into account the independence of centered processes B H1 t and B H2 t , we obtain that as k → ∞. Thus, the assumptions of Theorem 2.5 are satisfied.

Maximum likelihood estimation by continuous observations
Let the process X be observed on the whole interval [0, T ]. It is required to estimate the unknown parameter θ by these observations. 3.1. Likelihood function and construction of the estimator. In this section we construct a formula for continuous-time MLE, similar to the formula (3) for the discrete case.
Assumption 1. The covariance function of B t has a mixed derivative where K(t) is an even function, K ∈ L 1 [−T, T ].
Lemma 3.1. Under Assumption 1, the integral T 0 f (t) dB t exists as the mean square limit of the corresponding Riemann sums for any f ∈ L 2 [0, T ]. Moreover, Proof. According to Huang and Cambanis (1978), the integral  (4) holds. However, using the properties of a convolution, one can prove that

Define a linear operator Γ
It follows from (5) that The basic properties of the operator Γ T are collected in the following evident lemma.
Now we are ready to formulate our key assumption on the kernel K (in terms of the operator Γ T ).
is a likelihood function.
Proof. Let us show that the function L(θ) defined in (7) is a density function of a distribution of the process X t for given θ with respect to the density function of a distribution of the process B t (it coincides with X t when θ = 0). In other words, we need to prove that where P θ is the probability measure that corresponds to the value of the parameter θ. It suffices to show that for all partitions 0 = t 0 < t 1 < . . . < t N ≤ T of the interval [0, T ] and for all cylinder sets A ∈ F N the following equality holds: where F N is the σ-algebra, generated by the values B t k of the process B t at the points t k , k = 1, . . . , N . We have where L (N ) is the likelihood function (2) for the discrete-time model. To prove (8), it suffices to show that Due to joint normality of T 0 h T (s) dB s and ∆B (N ) , the conditional distribution of T 0 h T (s) dB s with respect to F N is Gaussian (Anderson, 2003, Theorem 2.5.1); its conditional variance is nonrandom. Let us find its parameters. By the least squares method, We have cov ∆B (N ) , ∆B (N ) = Γ (N ) . Calculate cov B t , ∆B (N ) : and for any vector Then where we have used that the operator Γ T is self-adjoint. Further, M Γ T h T = M 1 [0,T] = z, where the vector z is defined after (2). Hence, In order to calculate the variance we apply the partition-of-variance equality We have Hence, Applying the formula for the mean of the log-normal distribution, we obtain Thus, (8) is proved.
Corollary 3.4. The maximum likelihood estimator of θ by continuous observations is given byθ 3.2. Properties of the estimator. It follows immediately from (10) that the maximum likelihood estimatorθ T is equal tô Proposition 3.5. The estimatorθ T is unbiased and normally distributed. Its variance is equal to Proof. Unbiasedness and normality follows from the fact thatθ − θ is a linear functional of centered Gaussian process B. By (6), Thus, equation (12) immediately follows from (11).
It can be hard to verify the condition (13). The following result gives sufficient conditions for the consistency in terms of the autocovariance function of B.
Theorem 3.7. Let the process B = {B t , t ≥ 0} satisfy Assumptions 1 and 2. If the covariance function of increment process B N − B N −1 tends to 0: then the maximum likelihood estimatorθ T is mean-square consistent.
Proof. The estimatorθ (N ) from the discrete sample {X 1 , . . . , X N } is mean-square consistent by Theorem 2.5. The estimator from the continuous-time sample {X t , t ∈ [0, T ]} is unbiased. Now compare the variances of the discrete and continuous-time estimators.
The desired inequalities are got from the proof of Theorem 2.5. Suppose that T ≥ 1, N is an integer such that N ≤ T < N + 1. By equation (9) we have As varθ T = To prove the strong consistency ofθ T , we need the following auxiliary result.
Lemma 3.8. Let the process B satisfy the conditions of Theorem 3.3. Then the estimator processθ = {θ T , T ≥ 0} has independent increments.
Proof. Let T 2 ≤ T 3 . Then Similarly to the proof of Lemma 2.6, the random processθ T is Gaussian and its increments are proved to be uncorrelated so they are independent.
Theorem 3.9. Under conditions of Theorem 3.7 the estimatorθ T is strongly consistent.
Remark 3.10. The Brownian motion does not satisfy Assumption 1 (as for covariance function max(s, t) of Wiener process, ∂ max(s,t) ∂t is not continuous in s). So we extend our model such that it can handle Wiener process. Let the process B be a sum of two independent random processes, where B C satisfy Assumption 1, and W is a standard Wiener process. Let us look at the changes of the statements if the process B admits representation (15) (Assumption 1 for B is dropped). Lemma 3.1 changes as follows: