The memochange
package can be used for two things:
Checking for a break in persistence and checking for a change in mean.
This vignette presents the functions related to a break in persistence.
This includes BP_estim
, cusum_test
,
LBI_test
, LKSN_test
, MR_test
,
ratio_test
, and pb_sim
. Before considering the
usage of these functions, a brief literature review elaborates on their
connection.
The degree of memory is an important determinant of the characteristics of a time series. For an \(I(0)\), or short-memory, process (e.g., AR(1) or ARMA(1,1)), the impact of shocks is short-lived and dies out quickly. On the other hand, for an \(I(1)\), or difference-stationary, process such as the random walk, shocks persist infinitely. Thus, any change in a variable will have an impact on all future realizations. For an \(I(d)\), or long-memory, process with \(0<d<1\), shocks neither die out quickly nor persist infinitely, but have a hyperbolically decaying impact. In this case, the current value of a variable depends on past shocks, but the less so the further these shocks are past.
There are plenty of procedures to determine the memory of a series (see Robinson (1995), Shimotsu (2010), among others). However, there is also the possibility that series exhibit a structural change in memory, often referred to as a change in persistence. Starting with Kim (2000) various procedures have been proposed to detect these changes and consistently estimate the change point. Busetti and Taylor (2004) and Leybourne and Taylor (2004) suggest approaches for testing the null of constant \(I(0)\) behaviour of the series against the alternative that a change from either \(I(0)\) to \(I(1)\) or \(I(1)\) to \(I(0)\) occurred. However, both approaches show serious distortions if neither the null nor the alternative is true, e.g. the series is constant \(I(1)\). In this case the procedures by Leybourne et al. (2003) and Leybourne, Taylor, and Kim (2007) can be applied as they have the same alternative, but assume constant \(I(1)\) behaviour under the null. Again, the procedures exhibit distortions when neither the null nor the alternative is true. To remedy this issue, Harvey, Leybourne, and Taylor (2006) suggest an approach that entails the same critical values for constant \(I(0)\) and constant \(I(1)\) behavior. Consequently, it accommodates both, constant \(I(0)\) and constant \(I(1)\) behavior under the null.
While this earlier work focussed on the \(I(0)/I(1)\) framework, more recent approaches are able to detect changes from \(I(d_1)\) to \(I(d_2)\) where \(d_1\) and \(d_2\) are allowed to be non-integers. Sibbertsen and Kruse (2009) extend the approach of Leybourne, Taylor, and Kim (2007) such that the testing procedure consistently detects changes from \(0 \leq d_1<1/2\) to \(1/2<d_2<3/2\) and vice versa. Under the null the test assumes constant \(I(d)\) behavior with \(0 \leq d <3/2\). The approach suggested by Martins and Rodrigues (2014) is even able to identify changes from \(-1/2<d_1<2\) to \(-1/2<d_2<2\) with \(d_1 \neq d_2\). Here, under the null the test assumes constant \(I(d)\) behavior with \(-1/2<d<2\).
Examples for series that potentially exhibit breaks in persistence are macroeconomic and financial time series such as inflation rates, trading volume, interest rates, volatilities and so on. For these series it is therefore strongly recommended to investigate the possibility of a break in persistence before modeling and forecasting the series.
The memochange
package contains all procedure mentioned
above to identify whether a time series exhibits a break in persistence
mentioned above. Additionally, several estimators are implemented which
consistently estimate the point at which the series exhibits a break in
persistence and the order of integration in the two regimes. We will now
show how the usage of the implemented procedures while investigating the
price of crude oil.
First, we download the monthly price series from the FRED data base.
oil=data.table::fread("https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=MCOILWTICO&scale=left&cosd=1986-01-01&coed=2019-08-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2019-09-23&revision_date=2019-09-23&nd=1986-01-01")
To get a first visual impression, we plot the series.
oil=as.data.frame(oil)
oil$observation_date=zoo::as.Date(oil$observation_date)
oil_xts=xts::xts(oil[,-1],order.by = oil$observation_date)
zoo::plot.zoo(oil_xts, xlab="", ylab="Price", main="Crude Oil Price: West Texas Intermediate")
From the plot we observe that the series seems to be more variable in
its second part from year 2000 onwards. This is first evidence that a
change in persistence has occurred. We can test this hypothesis using
the functions cusum_test
(Leybourne,
Taylor, and Kim (2007), Sibbertsen and
Kruse (2009)) LBI_test
(Busetti and Taylor (2004)),
LKSN_test
(Leybourne et al.
(2003)), MR_test
(Martins and
Rodrigues (2014)) , and ratio_test
(Busetti and Taylor (2004), Leybourne and Taylor (2004), Harvey, Leybourne, and Taylor (2006)). In this
vignette we use the ratio and MR test since these are the empirically
most often applied ones. The functionality of the other tests is
similar. They all require a univariate numeric vector x
as
an input variable and yield a matrix of test statistic and critical
values as an output variable.
As a starting point the default version of the ratio test is applied.
ratio_test(x)
#> 90% 95% 99% Teststatistic
#> Against change from I(0) to I(1) 3.5148 4.6096 7.5536 225.943543
#> Against change from I(1) to I(0) 3.5588 4.6144 7.5304 1.170217
#> Against change in unknown direction 4.6144 5.7948 9.0840 225.943543
This yields a matrix that gives test statistic and critical values for the null of constant \(I(0)\) against a change from \(I(0)\) to \(I(1)\) or vice versa. Furthermore, the statistics for a change in an unknown direction are included as well. This accounts for the fact that we perform two tests facing a multiple testing problem. The results suggest that a change from \(I(0)\) to \(I(1)\) has occurred somewhere in the series since the test statistic exceeds the critical value at the one percent level. In addition, this value is also significant when accounting for the multiple testing problem. Consequently, the default version of the ratio test suggests a break in persistence.
We can modify this default version by choosing the arguments
trend
, tau
, statistic
,
type
, m
, z
, simu
,
and M
(see the help page of the ratio test for details).
The plot does not indicate a linear trend so that it seems unreasonable
to change the trend argument. Also, the plot suggests that the break is
rather in the middle of the series than at the beginning or the end so
that changing tau
seems unnecessary as well. The type of
test statistic calculated can be easily changed using the statistic
argument. However, simulation results indicate mean, max, and exp
statistics to deliver qualitatively similar results.
Something that is of more importance is the type of test performed.
The default version considers the approach by Busetti and Taylor (2004).
In case of a constant \(I(1)\) process
this test often spuriously identifies a break in persistence. Harvey,
Leybourne and Taylor (2006) account for this issue by adjusting the test
statistic such that its critical values are the same under constant
\(I(0)\) and constant \(I(1)\). We can calculate their test
statistic by setting type="HLT"
. For this purpose, we need
to state the number of polynomials z
used in their test
statistic. The default value is 9 as suggested by Harvey, Leybourne and
Taylor (2006). Choosing another value is only sensible for very large
data sets (number of obs. > 10000) where the test statistic cannot be
calculated due to computational singularity. In this case decreasing
z
can allow the test statistic to be calculated. This
invalidates the critical values so that we would have to simulate them
by setting simu=1
. However, as our data set is rather small
we can stick with the default of z=9
.
ratio_test(x, type="HLT")
#> 90% 95% 99% Teststatistic 90%
#> Against change from I(0) to I(1) 3.5148 4.6096 7.5536 58.9102128
#> Against change from I(1) to I(0) 3.5588 4.6144 7.5304 0.3085619
#> Against change in unknown direction 4.6144 5.7948 9.0840 44.2193169
#> Teststatistic 95% Teststatistic 99%
#> Against change from I(0) to I(1) 43.4794337 25.3386256
#> Against change from I(1) to I(0) 0.2290226 0.1290391
#> Against change in unknown direction 34.1387057 20.0073212
Again the test results suggests that there is a break from \(I(0)\) to \(I(1)\). Consequently, it is not a constant \(I(1)\) process that led to a spurious rejection of the test by Busetti and Taylor (2004).
Another test for a change in persistence is that by Martins and Rodrigues (2014). This is more general as it is not restricted to the \(I(0)/I(1)\) framework, but can identify changes from \(I(d_1)\) to \(I(d_2)\) with \(d_1 \neq d_2\) and \(-1/2<d_1,d_2<2\). The default version is applied by
MR_test(x)
#> 90% 95% 99% Teststatistic
#> Against increase in memory 4.270666 5.395201 8.233674 16.21494
#> Against decrease in memory 4.060476 5.087265 7.719128 2.14912
#> Against change in unknown direction 5.065695 6.217554 9.136441 16.21494
Again, the function returns a matrix consisting of test statistic and critical values. Here, the alternative of the test is an increase respectively a decrease in memory. In line with the results of the ratio test, the approach by Martins and Rodrigues (2014) suggests that the series exhibits an increase in memory, i.e. that the memory of the series increases from \(d_1\) to \(d_2\) with \(d_1<d_2\) at some point in time. Again, this also holds if we consider the critical values that account for the multiple testing problem.
Similar to the ratio test and all other tests against a change in
persistence in the memochange
package, the MR test also has
the same arguments trend
, tau
,
simu
, and M
. Furthermore, we can choose again
the type of test statistic. This time we can decide whether to use the
squared t-statistic or the standard t-statistic.
MR_test(x, statistic="standard")
#> 90% 95% 99% Teststatistic
#> Against increase in memory -1.637306 -1.920434 -2.504862 -2.880545
#> Against decrease in memory -1.651586 -1.951420 -2.514165 -1.277410
#> Against change in unknown direction -1.933137 -2.203370 -2.722017 -2.880545
As for the ratio test, changing the type of statistic has a rather small effect on the empirical performance of the test.
If we believe that the underlying process exhibits additional short
run components, we can account for these by setting
serial=TRUE
MR_test(x, serial=TRUE)
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> 90% 95% 99% Teststatistic
#> Against increase in memory 4.270666 5.395201 8.233674 10.727202
#> Against decrease in memory 4.060476 5.087265 7.719128 6.758906
#> Against change in unknown direction 5.065695 6.217554 9.136441 10.727202
While the test statistic changes, the conclusion remains the same.
All tests indicate that the oil price series exhibits an increase in
memory over time. To correctly model and forecast the series, the exact
location of the break is important. This can be estimated by the
BP_estim
function. It is important for the function that
the direction of the change is correctly specified. In our case, an
increase in memory has occurred so that we set
direction="01"
BP_estim(x, direction="01")
#> $Breakpoint
#> [1] 151
#>
#> $d_1
#> [1] 0.8127501
#>
#> $sd_1
#> [1] 0.08574929
#>
#> $d_2
#> [1] 1.088039
#>
#> $sd_2
#> [1] 0.07142857
This yields a list stating the location of the break (observation 151), semiparametric estimates of the order of integration in the two regimes (0.86 and 1.03) as well as the standard deviations of these estimates (0.13 and 0.15).
Consequently, the function indicates that there is a break in persistence in July, 1998. This means that from the beginning of the sample until June 1998 the series is integrated with an order of 0.85 and from July 1998 on the order of integration increased to 1.03.
As before, the function allows for various types of break point
estimators. Instead of the default estimator of Busetti and Taylor
(2004), one can also rely on the estimator of Leybourne, Kim, and Taylor
(2007) by setting type="LKT"
. This estimator relies on
estimates of the long-run variance. Therefore, it is also needed that
m
is chosen, which determines how many covariances are used
when estimating the long-run variance. Leybourne, Kim, and Taylor (2007)
suggest m=0
.
BP_estim(x, direction="01", type="LKT", m=0)
#> $Breakpoint
#> [1] 148
#>
#> $d_1
#> [1] 0.7660609
#>
#> $sd_1
#> [1] 0.08703883
#>
#> $d_2
#> [1] 1.067404
#>
#> $sd_2
#> [1] 0.07142857
This yields a similar result with the break point lying in the year 1998 and d increasing from approximately 0.8 to approximately 1.
All other arguments of the function (trend
,
tau
, serial
) were already discussed above
except for d_estim
and d_bw
. These two
arguments determine which estimator and bandwidth are used to estimate
the order of integration in the two regimes. Concerning the estimator,
the GPH (Geweke and Porter-Hudak (1983)) and the exact local Whittle
estimator (Shimotsu and Phillips (2005)) can be selected. Although the
exact local Whittle estimator has a lower variance, the GPH estimator is
still often considered in empirical applications due to its simplicity.
In our example the results of the two estimators are almost
identical.
BP_estim(x, direction="01", d_estim="GPH")
#> $Breakpoint
#> [1] 151
#>
#> $d_1
#> [1] 0.855238
#>
#> $sd_1
#> [1] 0.129834
#>
#> $d_2
#> [1] 1.034389
#>
#> $sd_2
#> [1] 0.1468516
The d_bw
argument determines how many frequencies are
used for estimation. Larger values imply a lower variance of the
estimates, but also bias the estimator if the underlying process
possesses short run dynamics. Usually a value between 0.5 and 0.8 is
considered.
BP_estim(x, direction="01", d_bw=0.75)
#> $Breakpoint
#> [1] 151
#>
#> $d_1
#> [1] 0.9146951
#>
#> $sd_1
#> [1] 0.07624929
#>
#> $d_2
#> [1] 1.173524
#>
#> $sd_2
#> [1] 0.0625
BP_estim(x, direction="01", d_bw=0.65)
#> $Breakpoint
#> [1] 151
#>
#> $d_1
#> [1] 0.5803242
#>
#> $sd_1
#> [1] 0.09805807
#>
#> $d_2
#> [1] 0.9353325
#>
#> $sd_2
#> [1] 0.08219949
In our setup, it can be seen that increasing d_bw
to
0.75 does not severely change the estimated order of integration in the
two regimes. Decreasing d_bw
, however, leads to smaller
estimates of \(d\).