Identifying Relevant Asset Pricing Time Scales

1. Introduction

Take a look at the figure below which displays the price level and trading volume of the S&P 500 SPDR over trading year from July 2012 to July 2013. The solid black line in the top panel shows the price process for the ETF at a daily frequency. Look at how much within day variation in the trading volume there is. You can see that the number of shares traded per day ranges over several orders of magnitude from roughly 1 \times 10^6 to 300 \times 10^6. What’s more, the red vertical lines in the top figure show the intraday range for the traded price, and these bands stretch \mathdollar 10 per share in some cases. People are trading this ETF at all sorts of different investment horizons. Any model fit to daily data will ignore really interesting economics operating at these shorter investment horizons. Vice versa, any model fit to higher frequency minute-by-minute data will miss out on some longer buy-and-hold decisions.

plot-sp500-spdr-price-scales

In this post, I ask a pair of questions: (a) “Is it possible to recover the most ‘important’ investment horizons from the time series of SPDR prices?” and (b) “What statistical techniques might you use to do this?”

I work in reverse order. After outlining a toy stochastic process with a clear time scale pattern in Section 2, I start the real work in Sections 3 and 4 by discussing a pair different statistical tools you might use to uncover important time scales in this asset market. Here, when you read the word ‘important’ you should think ‘time scales where people are actually making decisions’. In Section 3, I outline the standard approach in asset pricing of using a time series regression with multiple lags. Then, in Section 4, I show how this technique has an equivalent waveform representation. After reading Sections 2 and 3 it may seem like it’s always possible to recover the relevant investment horizons from a financial time series. In Sections 5 and 6, I end on a down note by giving a counter example. The offending stochastic process is a workhorse in financial economics—namely, the Ornstein-Uhlenbeck process. Thus, the answer to question (a) seems to be: No.

You can find all of the code to create the figures below here.

2. Toy Stochastic Process

The next 2 sections discuss different ways of recovering the relevant time scales from a financial data series. In particular, I am interested in the time series of log prices as I don’t want to have to worry about the series going negative. I define returns, r_{t + \Delta t}, as:

(1)   \begin{align*} r_{t + \Delta t} &= \log p_{t+\Delta t} - \log p_t \end{align*}

and assume that both log prices and returns are wide-sense stationary so that:

(2)   \begin{align*} \mathrm{E}[x_t] &= 0 \quad \text{and} \quad \mathrm{E}[x_t \cdot x_{t - h \cdot \Delta t}] = \mathrm{C}(h) \cdot \sigma^2 \end{align*}

for x_t \in \{ \log p_t, r_t \} where \mathrm{C}(h) denotes the h-period ahead autocorrelation function. I use +\Delta t instead of the usual +1 in the time subscripts above because I want to emphasize the fact that the log price and return time series are scale dependent. In the analysis below, I’m going to think about running the analysis at the daily horizon so that \Delta t = 1{\scriptstyle \mathrm{day}}.

plot-multiscale-process

To make this problem concrete, I use a particular numerical example:

(3)   \begin{align*} \log p_t &= \frac{95}{100} \cdot \left( \frac{1}{7} \cdot \sum_{h=1}^7 \log p_{(t - h){\scriptscriptstyle \mathrm{days}}} \right) - \frac{95}{100} \cdot \left( \frac{1}{30} \cdot \sum_{h=1}^{30} \log p_{(t - h){\scriptscriptstyle \mathrm{days}}} \right) + \frac{1}{10} \cdot \varepsilon_t \end{align*}

I plot a year’s worth of daily data from this process (err… I am using calendar time rather than market time… so think 365 not 252) in the plot above. This process says that the log price today will go up by 95 cents whenever the average log price over the last week was 1 unit higher, but it will go down by 95 cents whenever the average log price over the last month ago was 1 unit higher. It’s a nice example to work with because there is an obvious pattern in log prices with a period of just south of 2 months.

3. Autoregressive Representation

The most common way of accounting for time series predictability in asset pricing is to use an autoregression. e.g., you might run a regression of the log price level today on the log price level yesterday, on the log price level the day before, on the log price level the day before that, and so on…

(4)   \begin{align*} \log p_t &= \sum_{h=1}^H \beta_h \cdot \log p_{t-h} + \xi_{h,t} \end{align*}

Note that because of the wide-sense stationarity of the log price process the regression coefficients simplify to just the horizon-specific autocorrelation:

(5)   \begin{align*} \beta_h &= \frac{\mathrm{C}(h) \cdot \mathrm{StD}[\log p_t] \cdot \mathrm{StD}[\log p_{t-h}]}{\mathrm{Var}[\log p_{t-h}]} \quad \text{and} \quad \mathrm{StD}[\log p_t] = \mathrm{StD}[\log p_{t-h}] \end{align*}

Why might this approach make sense? First, for the process described in Equation (3), it’s obvious that the log price time series has an autoregressive representation since I constructed it that way. Second and more generally, this approach will hold due to Wold’s Theorem which states that every covariance stationary time series x_t can be written as the sum of 2 time series with the first time series completely deterministic and the second completely random:

(6)   \begin{align*} x_t &= \eta_t + \sum_{h=0}^{\infty} \mathrm{C}(h) \cdot \epsilon_{t-h}, \quad \sum_{h=1}^{\infty} |\mathrm{C}(h)|^2 < \infty \text{ and } \mathrm{C}(0) = 1 \end{align*}

Here, \eta_t is the completely deterministic time series and \epsilon_t is the completely random white noise time series. The figure below shows the coefficient estimates, \widehat{\mathrm{C}(h)}, from projecting the log price time series onto its past realizations for lags of anywhere from h = 1{\scriptstyle \mathrm{day}} to h = 3{\scriptstyle \mathrm{months}}.

plot-multiscale-autoregression-coefficients

4. Waveform Representation

Fun fact: There is also a waveform representation of the same autocorrelation function:

(7)   \begin{align*} \mathrm{C}(h) &= \int_{f \geq 0} \mathrm{S}(f) \cdot e^{i \cdot f \cdot h \cdot \Delta t} \cdot df \end{align*}

This representation will always exist whenever the data have translational symmetry. i.e., put yourself in the role a trader thinking about buying a share of the S&P 500 SPDR again. If you had to make a prediction about tomorrow’s price level as a function of the log price level today, its values 1 week ago, and its value 1 month ago, you wouldn’t really care whether the current year was 1967, 1984, 1999, or 2013. This is just another way of saying that the autocorrelation coefficients only depend on the time gap.

Where does this alternative representation come from? Why translational symmetry? Plane waves turn out to be the eigenfunctions of the translation operator, \mathrm{T}_\theta[\cdot]:

(8)   \begin{align*} \mathrm{T}_\theta[\mathrm{C}(h)] &= \mathrm{C}(h - \theta) \end{align*}

In the context of this note, the translation operator eats autocorrelation functions and returns the value \theta time periods to the right. i.e., if \mathrm{C}(4{\scriptstyle \mathrm{days}}) gave you the autocorrelation between the log price at any two points in time that are 4 days apart, then \mathrm{T}_{1{\scriptscriptstyle \mathrm{day}}}[\mathrm{C}(4{\scriptstyle \mathrm{days}})] would give you the autocorrelation between the log price at any two points in time that are 3 days apart. Note that the translation operator is linear since translating the sum of functions is the same as the sum of translated functions. Thus, just as if \mathrm{T}_\theta[\cdot] was a matrix, we can ask for the eigenfunctions of \mathrm{T}_\theta[\cdot] written as \mathrm{C}_{f}:

(9)   \begin{align*} \mathrm{T}_\theta[\mathrm{C}_f(h)] &= \mathrm{C}_f(h - \theta) = \lambda_{f,\theta} \cdot \mathrm{C}_f(h) \end{align*}

Such a process is obviously given by the complex plane waves with \mathrm{C}_f(h) = e^{i \cdot f \cdot h \cdot \Delta t} and \lambda_{f,\theta} = e^{-i \cdot f \cdot \theta \cdot \Delta t}.

plot-multiscale-process-powerspectrum

As a result, we can think about recovering all the information in the autocorrelation function at horizon h by projecting it onto the eigenfunctions \{ \mathrm{C}_f(h) \}_{f \geq 0} as depicted in the figure above known as a spectral density plot. This figure shows the results of 100 regressions at frequencies in the range [1/100{\scriptstyle \mathrm{days}},1/3{\scriptstyle \mathrm{days}}]:

(10)   \begin{align*} \log p_t &= \hat{a}_f \cdot \sin(f \cdot t) + \hat{b}_f \cdot \cos(f \cdot t) + \xi_{f,t} \end{align*}

Roughly speaking, the coefficients \hat{a}_f and \hat{b}_f capture how predictive fluctuations at the frequency f in units of 1/\mathrm{days} are of future log price movements. Thus, the summary statistic (\hat{a}_f^2 + \hat{b}_f^2)/2 captures how much of the variation in log prices is explained by historical movements at the frequency f. This statistic is known as the power of the log price series at a particular frequency.

The Wiener-Khintchine Theorem formally links these two different ways of looking at the same autocorrelation information:

(11)   \begin{align*} \mathrm{C}(h) &= \sum_{f \geq 0} \mathrm{S}(f) \cdot e^{i \cdot f \cdot h \cdot \Delta t} \cdot \Delta f \quad \text{and} \quad \mathrm{S}(f) = \sum_{h \geq} \mathrm{C}(h) \cdot e^{- i \cdot f \cdot h \cdot \Delta t} \cdot \Delta h \end{align*}

Using Euler’s formula that e^{i \cdot x} = \cos(x) + i \cdot \sin(x) and keeping only the real component yields the following mapping from frequency space to autocorrelation space:

(12)   \begin{align*} \mathrm{C}_{\mathrm{WK}}(h) &= \sum_{f=0}^F \left( \frac{\widehat{\mathrm{S}(f)}}{\sum_{f'=0}^F \widehat{S(f')}} \right) \cdot \cos(f \cdot h \cdot \Delta t) \end{align*}

where I assume that the range [0,F] covers a sufficient amount of the relevant frequency spectrum. The figure below verifies the mathematics by showing the close empirical fit between the two calculations.

plot-multiscale-actual-vs-predicted-autoregression-coefficients

5. A Counter Example

After giving some tools to mine relevant time scales from financial time series in the previous sections, I conclude by giving an example of a simple stochastic process which thumbs its nose at these tools. Before actually looking at the example, it’s worthwhile to stop for a moment to think about the sort of process which might be hard to handle. You can see glimpses of it in the analysis above. Specifically, note how even though I created the time series in Equation (3) using a 7 day moving average and a 30 day moving average, there is no evidence of these 2 time horizons in the sample autocorrelation coefficients. It’s not as if the figure shows a coefficient of:

(13)   \begin{align*} \widehat{\mathrm{C}(h)} &= \frac{95}{100} \cdot \left( \frac{1}{7} - \frac{1}{30} \right) \end{align*}

for all lags h \leq 7{\scriptstyle \mathrm{days}}. Likewise, the spectral density of the process shows a peak at somewhere between 1/7{\scriptstyle \mathrm{days}} and 1/30{\scriptstyle \mathrm{days}}. Thus, the time scale we see in the raw data is an emergent feature of the interaction of both the weekly and monthly effects. Intuitively, it would be very hard to identify the economically relevant time scale from a stochastic process where interesting features emerge at all time scales.

plot-ou-process

An Ornstein and Uhlenbeck gave an example of just such a stochastic process. Take a look at the figure above which plots the following Ornstein-Uhlenbeck (OU) process:

(14)   \begin{align*} d \log p_t &= \theta \cdot \left( \mu - \log p_t \right) \cdot dt + \sigma \cdot d\xi_t, \quad \text{with} \quad \mu = 0, \ \theta = 1 - e^{- \log 2 / 15}, \ \sigma = 1/10 \end{align*}

With dt = 1 day, the equation above reads: “Daily changes in the log price are 0 on average. However, the log price realizes daily kicks on the order of 1/10th of a percent, and these kicks have a half life of 15 days.” Thus, it’s natural to think about this OU process as having a relevant time scale on the order of 1 month, and you can see this time scale in the sample log price path. The peaks and troughs in the green line all last somewhere around 1 month.

Here’s the punchline. Even though the process was explicitly constructed to have a relevant monthly time scale, there is no obvious bump at the monthly horizon in either the autoregressive representation or the waveform representation. In fact, OU processes are well known to produce 1/f noise—i.e., noise which follows a power law decay pattern as shown in the figure below. Kicks which have a half life on the order of 30 days lead to emergent behavior at all time scales!

plot-ou-process-powerspectrum

6. Uniqueness of Approximations

Of course, there is a mapping between the precise rate of decay in the figure below at the relevant time scale, but this is besides the point. You would have to know the exact stochastic process to know to reverse engineer the mapping. What’s more, this problem isn’t an issue that will be solved with more advanced filtering techniques such as wavelets. It’s not that the filtering technology is too coarse to capture the real structure. It’s that the real time scale structure created by the OU process itself is incredibly smooth. If you see a price process whose power spectrum mirrors that of an OU process with 1/f decay, you can’t be sure if its an OU process with a monthly time scale as above or a process economic decisions being made at each horizon.

This result has to do with the fact that even very well behaved approximations are only unique in a very narrow sense. What do I mean by this? Well, consider asymptotic approximations where the approximation error is smaller than the last term at each level of approximation. i.e., the approximation:

(15)   \begin{align*} f(\epsilon) &\sim \sum_{n=0}^N a_n \cdot f_n(\epsilon) \end{align*}

is asymptotic to f(\epsilon) as \epsilon \to 0 if for each M \leq N:

(16)   \begin{align*} \frac{f(\epsilon) - \sum_{n=1}^N f_n(\epsilon)}{f_M(\epsilon)} &\to 0 \quad \text{as} \quad \epsilon \to 0 \end{align*}

Asymptotic approximations are well behaved in the sense that you can naively add, subtract, multiply, divide, etc\ldots them just like they were numbers. What’s more, for a given choice of \{ f_n \}_{n \geq 0}, all of the coefficients \{ a_n \}_{n \geq 0} are unique.

At first this uniqueness result looks really promising! However, on closer inspection it’s clear that the result is rather finicky. e.g., the same function can have different asymptotic approximations:

(17)   \begin{align*} \text{as } \epsilon \to 0, \quad \tan(\epsilon) &\sim \epsilon + \frac{1}{3} \cdot \epsilon^3 + \frac{2}{15} \cdot \epsilon^5 \\ &\sim \sin(\epsilon) + \frac{1}{2} \cdot \sin(\epsilon)^3 + \frac{3}{8} \cdot \sin(\epsilon)^5 \\ &\sim \epsilon \cdot \cosh\left( \epsilon \cdot \sqrt{2/3} \right) + \frac{31}{270} \cdot \left( \epsilon \cdot \cosh\left( \epsilon \cdot \sqrt{2/3} \right) \right)^5 \end{align*}

What’s more, different functions can have the same asymptotic approximations:

(18)   \begin{align*} e^{\epsilon} &\sim \sum_{n=0}^{\infty} \frac{\epsilon^n}{n!} \quad \text{as} \quad \epsilon \to \infty \\ e^{\epsilon} + e^{-1/\epsilon} &\sim \sum_{n=0}^{\infty} \frac{\epsilon^n}{n!} \quad \text{as} \quad \epsilon \searrow \infty \end{align*}

What’s really interesting about this last example is that these 2 functions have asymptotic approximations that share an infinite number of terms!

To close the loop, consider these approximation results in the context of the econometric analysis above. What I was doing in these exercises was picking a collection of \{f_n\}_{n \geq 0} and then empirically estimating \{a_n\}_{n \geq 0}. For each choice of approximations, I got a unique set of coefficients out. However, the counter example above in Section 5 shows that data generating functions with very different time scales can have very similar approximations. The analysis in this section shows that perhaps this result is not too surprising. A different way of putting this idea is that by choosing an approximation to data generating process, f(\epsilon), you are factoring the economic content of the series into 2 different component: \{a_n\}_{n \geq 0} and \{f_n\}_{n \geq 0}. If you take a stand on the \{f_n\}_{n \geq 0} terms, the corresponding \{a_n\}_{n \geq 0} will certainly be unique; however, there is no guarantee that these coefficients carry all of the economic information that you want to recover from the data. e.g., the relevant time scale information might be buried in the \{f_n\}_{n \geq 0} series rather than the coefficients \{a_n\}_{n \geq 0}.