I just finished reading Marcos Lopez de Prado’s chapter on Fractional Differencing in his new book, Advances in Financial Machine Learning. I have 2 questions/concerns about it:
1. If integrated time series are vulnerable spurious regression, why wouldn’t fractionally integrated time series be “fractionally spurious”?
2. Is it really true that there can be some predictive information in the levels (or “memory”) of the time series which can be captured by fractional differencing, but not integer differencing?
Because my background is machine learning, rather than pure math, I’m going to try to answer these questions using data and empirical evidence, rather than mathematical deduction.
To attempt to answer the first question, I rely on the fact that if you correlate various unrelated random walks with each other (without differencing them first), it will produce erroneously large correlation coefficients. This is how I will estimate the extent to which fractional differencing reduces (if at all) “spuriousness”. The traditional approach is to transform the fully integrated time series (i.e., I(1)) into a stationary I(0) time series by differencing it once (d = 1). The claim of fractional differencing is that it isn’t necessary to difference it of order 1: a fractional value, such as 0.5, is sufficient. Specifically, I will do the following:
a) I will generate 1000 random walks.
b) I will correlate them between each other, as is, in their original I(1) form. I expect to see a certain amount of spurious correlation under the form of correlation ratios that differ statistically from zero.
c) I will then start differencing these time series with d = 0.1. I will re-run the correlations. I will repeat this with d = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 (i.e. traditional differencing of order 1). For each analysis, I will keep the mean absolute correlation of the results.
d) I will then plot, for each value of d, from 0 to 1, the corresponding mean absolute rho that was obtained for each.
Here is the code. Here are the results:
The relationship between the value of d and the corresponding amount of spurious correlation discovered is not linear. We can see that it drops and quickly reaches diminishing returns at around d = 0.75, where it falls below the threshold of statistical significance (the orange line). This means that there is no significant decrease in spurious correlation associated with increasing d values, once we’ve reached approximately d = 0.75 (in the case of my artificial time series – the plateau might be reached at different values with differently constructed time series).
In other words, it appears that fractional differencing does indeed remove the risk of spurious correlation to the same extent as full differencing does, as long as we stay in that optimal range.
The next question then is: can we gain an improvement in predictive power from not using d = 1 (if the time series has long-term memory) ? In an attempt to answer the second question, I will try to build a data generating process with memory, such that using the same model fitted on an I(0) version of it performs more poorly than if it is fitted on a fractionally differentiated version of the time series. This will demonstrate that integer differentiation indeed eliminated some predictive component that the fractional differentiation preserved.
The goal of that model will be to predict changes out of sample in the underlying time series, using as predictor lags of the modified (i.e. fully or fractionally differentiated) version of itself. After all, out of sample predictive power is the ultimate test for anything. If it passes this test, I’m happy.
Here is the code. Here are the results (note that, in this chart, the R2 value reported is relative to the R2 obtained by using d = 1, on the same validation set with the same model):
My time series with long-term memory is an Ornstein-Uhlenbeck process with a deterministic trend. The trend is there to make it non-stationary (otherwise, it would already be stationary – no need to difference it!).
On 50,000 artificially generated data points from my Ornstein-Uhlenbeck process with trend, I keep 10% as validation set. I train a simple regression model, where I try to predict the change (d = 1) in the time series at t, given the fractionally differenced value at t - 1.
I fit this model 10 times, using the values of d 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0. In each case, I also run the Augmented Dickey-fuller test on the differenced time series, to see whether it is considered stationary or not. Notice in the chart that the relative R2 begins to increase after the ADF p-value hits 0. Then, it starts decreasing again until it we reach d = 1. In other words: the optimal d value is around 0.5.
We showed that fractional differencing eliminates spurious regression as thoroughly as full differencing, as long as we choose a reasonable value of d. This optimal value can be determined by running an ADF test (the time series should be sufficiently differenced to pass the test), while simultaneously making sure that the time series is well correlated to the original, I(1) version.
In addition to that, we confirmed the claim that the levels of some time series contain important predictive information which, if we do not eliminate by full differencing, can be used to produce models of superior out of sample (non-spurious) predictive ability.
Thanks to Mirza Trokic for the Python fractional differencing code!