Test Statistics for Parameter Changes in INAR(p) Models and a Simulation Study

In the paper a change detection procedure is introduced for integervalued autoregressive processes which are inspired by the ordinary autoregressive model. The power of the test is investigated in a simulation study to determine how effectively it can detect changes in the key parameters of the process. Zusammenfassung: In diesem Artikel schlagen wir eine Methode vor, die für das Erkennen von Veränderungen in einem ganzzahligen autoregressiven Prozess gebraucht werden kann. Dann wird eine Simulationsstudie ausgeführt, um zu bestimmen, wie oft der Test eine Veränderung in den verschiedenen Parametern ermitteln kann.


Introduction
Detecting a change of a parameter in a stochastic model is an important task, and one that has been investigated since decades.In econometrics, medicine or population dynamics, failing to notice a change in a parameter will lead to unreliable estimates of the parameters, and, ultimately, false predictions.Several statistics have been constructed for a wide range of stochastic models to test for parameter change.For a review of these techniques we refer to Csörgő and Horváth (1997).
In this paper we will consider the integer-valued autoregressive process of order p, abbreviated INAR(p) and test for a change in several crucial parameters.The INAR(p) model, introduced by Alzaid and Al-Osh (1987) and Alzaid and Al-Osh (1990) is an integer analogue of the AR(p) model that is extensively used in time series analysis.Its main advantage is that it is nonnegative and integer-valued, therefore more applicable in situations where we have data of this type -most frequently count data.
The approach we will use to test for a parameter change is similar to the CUSUM approach described in Kang and Lee (2009) and Lee and Na (2005).After introducing the test statistics we will describe an empirical simulation study which investigates the power of the test as the function of several parameters of the process.
In this first section we will introduce the INAR(p) model and describe the statistical problem that has motivated this paper.The second section contains the details of the calculation of the test statistics.The fourth section is dedicated to the simulation study.

The INAR(p) Model
Definition.An INAR(p) time series model is a stochastic process (X k ) k≥−p+1 , k ∈ Z, given by where for all i ∈ N, {ξ i,k (j)} j∈N is a sequence of i.i.d.Bernoulli random variables with mean α i,k and (ε k ) k∈N is a sequence of independent nonnegative integer-valued random variables with mean µ k and variance σ 2 k , such that all these sequences are independent of each other.For technical reasons, we will suppose that X 0 , X −1 , . . ., X −p+1 are deterministic throughout the paper.The α's are called the coefficients, the distribution of the ε k 's is called the innovation distribution.We will also call the distributions of the ξ i,k (j)'s the offspring distribution.
Remark.Using the binomial thinning operator due to Steutel and Harn (1979) we can also write the process in the following form: j=1 ξ j , otherwise, where (ξ j ) j∈N is a sequence of i.i.d.Bernoulli random variables with mean α.
Remark.The • operator emphasizes the similarity to the AR(p) model -indeed, many of our results were inspired by analogous results for AR(p) processes.There are, however, two very important differences.The first is the innovation.In the AR(p) model this is assumed to have zero mean.Here it is nonnegative.The second is that the deterministic regression (not counting the innovation) in AR(p) is replaced here by a stochastic regression, which makes it more fruitful to consider the INAR(p) model as a special multitype branching process with immigration.We will also use this approach later on.
Remark.The motivation of the process is easy to understand: let us take a population in which every member has 0 or 1 offspring at p consequent time steps and then deceases.The number of offspring is independent in the different time steps and the members act independently of each other.If we augment this process with a stochastic immigration we obtain the INAR(p) model.

The Goal of the Analysis
The parameters of the process are contained in the parameter vector θ k defined by

T. Szabó 267
The variance of the innovation, denoted by σ 2 k , is not included in θ k because we will not test for change in that quantity.The reason that a notation was introduced for it is that its estimation is necessary to calculate the test statistics.
Let us split θ k = (ϕ k , η k ) into parameter vectors of interest ϕ k and nuisance vectors η k .We want to test the following hypothesis:

Calculation of the Test Statistics
In this section we will work under the null hypothesis and derive test statistics whose limiting distribution is a well-known process.Under the null hypothesis, the process takes the following form: where for all i ∈ N, {ξ i (k, j)} j∈N is a sequence of i.i.d.Bernoulli random variables with mean α i and (ε(ℓ)) ℓ∈N is a sequence of i.i.d.nonnegative integer-valued random variables with mean µ ε and variance σ 2 ε , such that all these sequences are independent of each other.
Remark.Note that the index k has moved from subscript to within parentheses for each variable.This emphasizes the fact that their distribution is no longer dependent of k.
We can distinguish three types of INAR(p) processes: In the paper we will only consider the stable case, i.e., when under the null hypothesis First we calculate the estimates of the parameters using the conditional least squares (CLS) method.This method, first proposed by Klimko and Nelson (1978) relies on the sequence where F n is the σ-algebra generated by X 1 , . . ., X n .Given an observed trajectory X 1 , . . ., X n , the sequence (M k ) n k=1 depends only on the parameters.The conditional least squares estimates of the parameters are the values for which the sum of squares ∑ n k=1 M 2 k is minimal.After simple calculations the estimates are with Replacing the parameters in the M k with their estimates we obtain the sequence M (n) k .The test statistics will be based on the stochastic process where ⌊x⌋ denotes the integer part of a real number x ∈ R, A −1/2 denotes the unique symmetric positive definite square root of the inverse of a symmetric positive definite matrix A, and I n ( θ (n) ) is a strongly consistent estimator of an analogue of the Fisher information matrix.
Using techniques similar to those of Lee and Na (2005) we can prove the following theorem, essentially similar to Theorem 2.1 of that paper: Using Theorem 2.1 we can prove the following result: , and let (B(t)) 0≤t≤1 a standard one-dimensional Brownian bridge.Then under the conditions of Theorem 2.1 as n → ∞.
Using Theorem 2.2 we can construct three tests for a change in any parameter, similarly to Berkes, Gombay, and Horváth (2009).The components of the Brownian bridge (B(t)) 0≤t≤1 in Theorem 2.1 are independent, therefore we can test for change in several parameters simultaneously.If we perform d tests together, all of which have a significance level of 1 − α * , then under H 0 each test accepts H 0 with probability 1 − α * .We accept H 0 if and only if all tests accept it, and the tests are independent, therefore the probability of the simultaneous test accepting H 0 is (1 − α * ) d .Hence, for the simultaneous test to have a significance level of α, we have to put then we reject H 0 and conclude that there was a downward change in the i-th parameter.
Similarly, if inf 0≤t≤1 then we reject H 0 and conclude that there was an upward change in the i-th parameter. Here then we reject H 0 and conclude that there was a change in the i-th parameter.Here then we reject H 0 and conclude that there was a temporary change (also called epidemic alternative) in the i-th parameter, as defined in Csörgő and Horváth (1997), 1.7.4.Here The cumulative distribution functions to compute the appropriate quantiles are the following: The first two results are well-known, the third is due to Kuiper (1960).

Calculation of the Information Matrix
The heuristic reasoning behind the definition of ( M n (t)) 0≤t≤1 is the following: due to the martingale central limit theorem, where c is a constant dependent on θ and σ 2 ε , and (W t ) 0≤t≤1 is a standard Brownian motion.Therefore, by a rough approximation where I n is the n × n identity matrix.The likelihood function is We will take the derivative of the loglikelihood function and work with that quantity.The first term will be regarded as constant.This is a simplification because c actually depends on the parameters but taking this into account leads to calculations that are difficult to handle.Also, we will not take into account the constant factor before the sum of the M k but will rather work with the analogue of the information matrix.Therefore, the analogue of the loglikelihood function that we will consider will be Using this function we are developing similar results to those of the likelihood ratio tests described in Csörgő and Horváth (1997).The proof that this heuristic reasoning leads to valid results is Theorem 2.1.Based on Q n , the analogue of the efficient score vector is T. Szabó

271
The appropriate analogue of the Fisher information matrix is We can write I n (θ) in the form ) .
The vector X k−1 is measurable with respect to F k−1 , therefore .
By simple calculation Since σ 2 ε appears in I n (θ) we have to provide an estimator for it.To do this, we introduce the sequence of martingale differences Replacing the α coefficients and µ ε both in the formula and in M 2 k by their estimates we have For the conditional least squares estimation we need to minimize the sum of squares Taking the derivative with respect to σ 2 ε we obtain the CLS estimate ) .
The matrix I n (θ) is defined by replacing in I n (θ) the variance σ 2 ε by its CLS estimate.Remark.Using the ergodic theorem for Markov chains it can be proved that for some matrix I(θ).The entries of I(θ) are increasing functions of the variances of the offspring and innovation distributions.This will explain some of the phenomena arising later.

Simulation Study
The second part of the paper deals with a simulation study that was carried out using Octave 2.3.4.Both the recursive generation of the trajectories and the calculation of the test statistics are given explicitly in the previous sections, therefore the implementation was straightforward and no further details are needed here.

Technical Details
The simulations were run on dual-core Intel processors under Windows XP.The generation of a trajectory with length 400 and the analysis of that case lasted approximately 0.4 seconds-therefore the typical experiment of 1000 simulations took approximately 400 seconds.The running time increased approximately linearly with trajectory length.Higher order cases also required significantly more time due to the higher dimension matrix calculations.

Rate of Convergence and the Optimal Trajectory Length
Theorem 2.2 provides the asymptotic distribution of the test statistics but says nothing about the rate of convergence.First we investigated this issue by performing Test 2 both individually and simultaneously for INAR(1) processes with no change.For a trajectory length of 400, an asymptotic significance level of 95 % resulted in typical Type I error rates of 3.4-3.8%.This effect disappeared slowly, the Type I error being between 4.7 % and 5.1 % for a trajectory length of 4000.A natural conclusion would be to perform the simulation study for a trajectory length of 4000.Apart from a tenfold increase in running time, there were two major problems with this.Firstly, the average researcher is quite unlikely to have a time series with 4000 data points, and therefore a study like this would say little to prospective users of the method.Secondly, for a trajectory length of 4000, the empirical power is larger than 0.9 for almost any change and therefore the phenomena described below cannot be observed.Based on these reasons, we decided to investigate trajectories with 400 data points, with the caveat that in these cases for significance level 1 − α the Type I error rate may differ from α by a few percents.

The First-order Case -General Observations
The first investigated case was an INAR(1) process where the innovation was Poisson distributed.We performed Test 2 for both parameters simultaneously with overall significance 0.95.The length of a trajectory was n = 400, change occurred at k = 200.Before the change, the coefficient α 1 was 0.5 and the innovation was Poisson distributed with mean µ ε = 1.After the change, both parameters took different values: α 1 was selected from the set {0, 0.1, 0.2, . . ., 0.9}, and µ ε was selected from {0, 0.2, 0.4, . . ., 2}.For all such possible pairs of after-change parameters, a Monte-Carlo experiment with 1000 repetitions was run to determine the empirical power of the simultaneous two-sided test for both parameters.The result can be seen in Figure 1.Tables 1 and 2 are also provided for the cases when there was only a change in one of the two parameters.It is important to note that the power is significantly lower along the line The explanation is that this formula provides the mean of the unique stationary distribution, which initially is equal to 2. Apparently, the test is very sensitive to a change in this quantity.Figure 2 illustrates this effect with two typical trajectories.In the first one, the mean of the stationary distribution remains the same, in the second one, it decreases.The change can be detected by eye.Another shortcoming of the test is that the power drops steeply along α 1 = 0.9 when there is an increase in µ ε .This is due to the fact that in this case α 1 is slightly overestimated (if α 1 changes to 0.9 and µ ε to 1.2 then the mean of the CLS estimates for α 1 was 0.975) and for α 1 values close to 1 the behavior of the process changes drastically.This is is the nearly unstable case, and in the unstable case the process loses its stationarity, E(X k ) → ∞ linearly and Theorem 2.2 is no longer valid-see e.g.Barczy, Ispány, and Pap (2011) This case is illustrated by Figure 3, where we can see that a small change in the coefficient can be accompanied by a large change in the innovation without any visible change occurring.In Figure 3 the coefficient and the innovation mean are constant throughout the trajectory.Tables 1 and 2 are provided for the cases where there was a change in only on parameter.

The Effect of Different Trajectory Lengths and Change Points
The above anomalies can be avoided by investigating longer trajectories.We did this for three different cases.The first one was the nearly unstable case of α 1 = 0.9 and µ ε = 2 after the change.Here the power increased rapidly and reached 0.985 at the trajectory length of 1400.The second case was α 1 = 0.4 and µ ε = 1.2 after the change.Here both changes are very small and in opposite directions, partly cancelling each other out.It is no surprise that the test performed very poorly for a trajectory length of 400, and longer trajectories repaired this problem only slowly, with the power reaching 0.9 only at the trajectory length of 6600.However, the third case demonstrated that this is mostly due to the small change in the values and not the fact that the mean of the stationary distribution remained constant.Here, the mean of the stationary distribution remained constant again but the changes were greater in value -α 1 changed to 0.8 and µ ε to 0.4.The power reached 0.9 for a trajectory length of only 600.Tables 3, 4 and 5 summarize these results.It can also be expected that the test would perform poorly when the change occurs near the beginning or the end of the trajectory.The simulation confirmed this conjecture.However, it also showed that the test performed well in a broader range near the middle of the trajectory than would have been expected.The results can be found in Tables 6 and 7. Trajectory length was again 400 for comparison with the previous subsection, α 1 changed to 0.3 and 0.7 and µ ε to 0.8 and 1.2 for a downward and upward change, respectively.
The simulation also revealed an interesting phenomenon: the power was significantly greater if the change occurred at the beginning of the trajectory for a downward change.However, an upward change was detected with greater probability near the end of the trajectory.
One heuristic reasoning for this is the following: large values of the test process appear because the parameters are misestimated.The effect is that in every step M k differs from M k largely.These deviations sum up and result in large absolute values of the test process-and, ultimately, the rejection of the null hypothesis.However, this effect is partly canceled out by the smaller values of ( I n ( θ (n) )) −1/2 .The values of I n ( θ (n) ) increase with the increasing estimated variance of the innovation and offspring distributions.In this case, all these variances increase with the respective means increasing.For a downward change near the beginning of the trajectory, the smaller mean is dominant and the estimate will be close to it, therefore I n ( θ (n) ) will have smaller values and will decrease extreme values of the process to a smaller extent.If the change occurs near the end of the trajectory, then the larger mean is dominant and larger values of I n ( θ (n) ) will decrease the extreme values to a greater extent, therefore the null hypothesis will be accepted with a greater probability.For an upward change, these phenomena occur similarly, only for opposite ends of the trajectory.

Individual Tests
The next area of investigation was the performance of the individual tests.First we checked whether a change in one of the parameters caused a false detection for the other one.For this we applied Test 2 for each parameter and changed only the other one.The range of after-change parameters and every other parameter was the same as in 4.3.The results are contained in Tables 8 and 9.The findings are the following: for the α 1 coefficient the test performed poorly only if there was a great upward change in µ.This is due to a fundamental shortcoming of the application of CLS estimation to these processes.The CLS estimates are designed for an INAR(p) process with no change and perform very well under those conditions.If, however, the parameters change over time, the behavior of the CLS estimates cannot be simply described.Especially if a drastic change occurs in the parameters and the variance of the process is relatively large, the best-fitting model may well be one with relatively small innovation but α 1 close to 1.It was indeed found that for large changes of µ ε the CLS estimate of α 1 differed greatly from the real value even if there was no change in α 1 .This phenomenon was even more pronounced for larger innovations (see 4.6) and also affected µ ε .It is important to note, however, that while the estimates for µ ε were incorrect, the null hypothesis was accepted for µ ε with more probability than for α 1 .
Another point of interest can be the the value of the last two entries in Table 9.When α changes to 0.8, a change is falsely detected in µ ε with 0.275 probability.However, when α changes to 0.9, a larger value, the rate of false detections drops to 0.007.This is again due to the nearly unstable phenomenon.The estimation of the parameters differs from the real values, but this effect is countered by the increased variance, and therefore the much smaller values of ( I n ( θ (n) )) −1/2 .Therefore, the absolute value of the process remains small and the null hypothesis is accepted.
Next, we investigated the power of the one-sided tests.For this, we used Test 1, which was performed individually for both parameters with no change occurring in the other parameter.As can be seen in Tables 10 and 11 the one-sided test did not perform notably better than the simultaneous two-sided test.Here a little explanation may be needed on the design of Test 1.As noted in 4.4, extreme values of the test statistics arise due to a misestimation in the parameters.An upward change in one of the parameters means that the parameter will be overestimated before the change and underestimated after it (this is not always the case as will be seen in 4.6 but is generally true).Therefore the test statistics takes negative values before the change and then begins to increase after it.Hence, if it takes a value outside the critical interval, then that will be a negative value.Similarly, a downward change will result in large positive values of the test statistics.

The Magnitude of the Innovation
In the next case, the innovation distribution was again Poisson, but this time with a mean of µ ε = 100.The question here was whether the power was dependent on the absolute or the relative value of change.The coefficient did not change, all other parameters were the same as in 4.3 and the simultaneous two-sided test was carried out for both parameters.The results can be found in Table 12.
There were two important findings: first, the Type II error rate seems to depend on the absolute rather than the relative amount of change in the innovation mean, because here the empirical power was 1 for almost every case.There were two important exceptions.change in the structure of the process, like this, is detected with smaller probability.The exact numbers can be found in Table 16.

Conclusions
From the described simulation we can draw the following conclusions: the test is able to detect change in any of the described parameters.However, the power of the test decreases for changes which occur very close to the beginning and the end of the trajectory.Larger changes are detected with greater probability and a change in the mean of the stationary distribution adds to the power of the test greatly.The absolute, rather than the relative magnitude of the change in the innovation influences the power of the test.One-sided tests are more powerful than their two-sided alternatives.
Unfortunately, the test also has several shortcomings.The individual test falsely detects change if there is a large change in one of the other parameters.If the coefficient is close to 1 or the change in the parameters is large enough, the CLS fitted process has large variance and the test fails in almost every case.Therefore, the test is best applied if the change in the parameters is moderate and the coefficient is not too close to 1.
These observations are valid for a trajectory length of 400 but longer trajectories increase the power greatly, and for a trajectory with several thousand data points, changes are almost always detected.For long trajectories the Type I error rate is approximately the same as can be expected from significance levels, while for shorter ones it may be a few percents lower.With these limitations, however, the test can be used to detect parameter change.

Table 1 :
First-order process, change in the coefficient Empirical power for constant µ ε = 1, α 1 changes from 0.5

Table 2 :
First-order process, change in the innovation

Table 3 :
Effect of trajectory length for the nearly unstable caseEmpirical power, µ ε from 1 to 2, α 1 changes from 0.5 to 0.9

Table 4 :
Effect of trajectory length for small change Empirical power, µ ε from 1 to 1.2, α 1 changes from 0.5 to 0.4

Table 5 :
Effect of trajectory length when the mean of the stationary distribution is constant Empirical power, µ ε from 1 to 0.4, α 1 changes from 0.5 to 0.8

Table 6 :
Effect of the change point for downward changeEmpirical power, µ ε from 1 to 0.8, α 1 changes from 0.5 to 0.3

Table 7 :
Effect of the change point for upward change Empirical power, µ ε from 1 to 1.2, α 1 changes from 0.5 to 0.7

Table 8 :
First-order process, two-sided test for the unchanged coefficient

Table 9 :
First-order process, two-sided test for the unchanged innovation Accept rates of Test 2 for µ ε for constant µ ε = 1, α 1 changes from 0.5

Table 10 :
First-order process, one-sided test for the innovation Empirical power for constant α 1 = 0.5, µ ε changes from 1

Table 11 :
First-order process, one-sided test for the coefficient Empirical power for constant µ ε = 1, α 1 changes from 0.5