Comparing “Explanations” for the iVol Puzzle

1. Motivation

A stock’s idiosyncratic-return volatility is the root-mean-squared error, \mathit{ivol}_{n,t} = \sqrt{ \sfrac{1}{D_t} \cdot \sum_{d_t=1}^{D_t} \varepsilon_{n,d_t}^2}, from the daily regression

(1)   \begin{align*} r_{n,d_t} = \alpha + \beta_{\mathit{Mkt}} \cdot r_{\mathit{Mkt},d_t} + \beta_{\mathit{SmB}} \cdot r_{\mathit{SmB},d_t} + \beta_{\mathit{HmL}} \cdot r_{\mathit{HmL},d_t} + \varepsilon_{n,d_t}. \end{align*}

Ang, Hodrick, Xing, and Zhang (2006) shows that stocks with lots of idiosyncratic-return volatility in the previous month have extremely low returns in the current month. To quantify just how low these returns are, I run a cross-sectional regression each month,

(2)   \begin{align*} r_{n,t} &= \mu_{r,t} + \beta_t \cdot \widetilde{\mathit{ivol}}_{n,t-1} + \epsilon_{n,t} \end{align*}

where \widetilde{\mathit{ivol}}_{n,t} = \sfrac{(\mathit{ivol}_{n,t} - \mu_{\mathit{ivol},t})}{\sigma_{\mathit{ivol},t}}. Over the period from January 1965 to December 2012, I estimate that a trading strategy which is long the stocks with higher-than-average idiosyncratic-return volatility in the previous month and short the stocks with lower-than-average idiosyncratic-return volatility in the previous month,

(3)   \begin{align*} \beta_t &= \frac{1}{N \cdot \sigma_{\mathit{ivol},t-1}} \cdot \sum_n \left( \mathit{ivol}_{n,t-1} - \mu_{\mathit{ivol},t-1} \right) \cdot r_{n,t}, \end{align*}

has an average excess returns of \langle \beta_t \rangle = -0.98{\scriptstyle \%} per month or -10.05{\scriptstyle \%} per year.

plot--ivol-puzzle--monthly-betas

This is puzzling on two levels. First, standard asset-pricing theory says that traders shouldn’t be compensated for holding diversifiable risk, so it’s surprising that idiosyncratic-return volatility is priced at all. “But, wait a second.”, you say. “Maybe there’s some friction that makes it hard to diversify-away some of this idiosyncratic risk.” Right. Here’s where the second level comes in. If this were the case, if what we’re calling idiosyncratic-return volatility was somehow non-diversifiable, then you’d expect stocks with lots of idiosyncratic-return volatility to trade at a discount, not a premium. You’d expect these stocks to earn higher returns to compensate traders for holding additional risk, not lower returns. The results in Ang, Hodrick, Xing, and Zhang (2006) are so interesting because they suggest that idosyncratic-return volatility is not only priced but priced wrong. People don’t just care about idiosyncratic-return volatility, they covet it.

Since 2006 there have been numerous papers that have attempted to explain this puzzling result. But, how can we tell if an explanation is good? In this post, I show how to answer this question using techniques introduced in Hou and Loh (2014). You can find all the code here. The data comes from WRDS.

2. Standard Approach

The standard approach for testing to see if a candidate variable explains the idiosyncratic-return-volatility puzzle involves two stages. In the first stage, people show that the candidate variable is a strong predictor of idiosyncratic-return volatility in a cross-sectional regression,

(4)   \begin{align*} \widetilde{\mathit{ivol}}_{n,t} &= \gamma_t \cdot \widetilde{x}_{n,t} + \xi_{n,t}, \end{align*}

where \widetilde{x}_{n,t} = \sfrac{(x_{n,t} - \mu_{x,t})}{\sigma_{x,t}}. So, for example, if the candidate variable is the maximum daily return in the previous month as suggested in Bali, Cakici, and Whitelaw (2011), then this first-stage regression verifies that the stocks with the highest maximum return in any given month also have the most idiosyncratic-return volatility.

In the second stage, people then run a horse-race regression to see if the candidate variable “drives out” the significance of idiosyncratic-return volatility when predicting monthly returns:

(5)   \begin{align*} r_{n,t} &= \mu_{r,t} + \beta_t \cdot \widetilde{\mathit{ivol}}_{n,t-1} + \delta_t \cdot \widetilde{x}_{n,t-1} + \epsilon_{n,t}. \end{align*}

The idea behind this second-stage regression is simple. If the estimated \langle \beta_t \rangle = 0 and \langle \delta_t \rangle < 0, then the candidate variable explains both a) which stocks had lots of idiosyncratic-return volatility in the the previous month and b) which stocks realized very low returns in the current month.

But, what happens if \langle \beta_t \rangle < 0 and \langle \delta_t \rangle < 0, meaning that the candidate variable doesn’t explain all of the idiosyncratic-return-volatility puzzle? How can we tell how much of the idiosyncratic-return-volatility puzzle is explained by a given candidate? This two-stage regression procedure can’t answer this question.

3. Alternative Strategy

To understand how much of the puzzle is explained by, say, a stock’s maximum daily return in the previous month, we need to decompose the excess returns to trading on idiosyncratic-return volatility into two components, namely, the part explained by a stock’s maximum return and everything else:

(6)   \begin{align*} \beta &= \mathrm{Cov}[ \,  r_{n,t} ,  \,  \widetilde{\mathit{ivol}}_{n,t-1} \,  ] \\ &= \mathrm{Cov}\left[ \, r_{n,t}, \, \gamma_t \cdot \widetilde{x}_{n,t-1} + \xi_{n,t-1} \, \right] \\ &=  \underbrace{\gamma_t \cdot \mathrm{Cov}\left[ \, r_{n,t}, \, \widetilde{x}_{n,t-1} \, \right]}_{\text{Explained}} + \underbrace{\mathrm{Cov}\left[ \, r_{n,t}, \, \xi_{n,t-1} \, \right]}_{\substack{\text{Everything} \\ \text{Else}}} \end{align*}

This explained part is just the returns to a trading strategy that is long the stocks with a higher-than-average value of the candidate variable in the previous month and short the stocks with a lower-than-average value of the candidate variable in the previous month,

(7)   \begin{align*} \beta_{E,t} &= \gamma_t \cdot \mathrm{Cov}\left[ \, r_{n,t}, \, \widetilde{x}_{n,t-1} \, \right] = \frac{\gamma_t}{N \cdot \sigma_{x,t-1}} \cdot \sum_n \left( x_{n,t-1} - \mu_{x,t-1} \right) \cdot r_{n,t}, \end{align*}

with the portfolio position scaled by the predictive power of the candidate variable over idiosyncratic-return volatility, \gamma_t. Put differently, a candidate variable explains a lot of the idiosyncratic-return-volatility puzzle if it is a good predictor of idiosyncratic-return volatility, \gamma_t > 0, and it generates negative returns when you trade on it, (N \cdot \sigma_{x,t-1})^{-1} \cdot \sum_n( x_{n,t-1} - \mu_{x,t-1}) \cdot r_{n,t} < 0.

If we have an expression for the explained component of the idiosyncratic-return-volatility puzzle, then we can use GMM to estimate it. Let \mathbf{\Theta}_t be a vector of coefficients to be estimated in month t,

(8)   \begin{align*} \mathbf{\Theta}_t  &= \begin{bmatrix} \mu_{r,t} & \beta_t & \gamma_t & \beta_{E,t} \end{bmatrix}, \end{align*}

using the cross-section of all NYSE, AMEX, and NASDAQ stocks in the WRDS. We can then estimate this (1 \times 4)-dimensional vector of coefficients using the following moment conditions:

(9)   \begin{align*} g_N(\mathbf{\Theta}_t) &= \frac{1}{N} \cdot \sum_n \begin{pmatrix} r_{n,t} - \mu_{r,t} - \beta_t \cdot \widetilde{\mathit{ivol}}_{n,t-1} \\ \left( r_{n,t} - \mu_{r,t} - \beta_t \cdot \widetilde{\mathit{ivol}}_{n,t-1} \right) \cdot \widetilde{\mathit{ivol}}_{n,t-1} \\ \left( \widetilde{\mathit{ivol}}_{n,t-1} - \gamma_t \cdot \widetilde{x}_{n,t-1} \right) \cdot \widetilde{x}_{n,t-1} \\ \beta_{E,t} - \gamma_t \cdot r_{n,t} \cdot \widetilde{x}_{n,t-1} \end{pmatrix}, \end{align*}

where the first two moments estimate the OLS regression in Equation (2), the third moment estimates the OLS regression in Equation (4), and the fourth moment estimates the explained component of the idiosyncratic-return-volatility puzzle as given in Equation (7). When I estimate these 4 coefficients each month for the maximum return candidate explanation, I find that the explained return is -0.84{\scriptstyle \%} per month, or about \sfrac{0.84}{0.99} = 85{\scriptstyle \%} of the total puzzle.

plot--ivol-puzzle--fraction-explained

4. What Counts As An “Explanation”?

One of the nice things about setting up the problem up this way—as opposed to using approximation methods like in the original Hou and Loh (2014) article—is that it makes really clear what an explanation is. A candidate variable is a good explanation of the idiosyncratic-return-volatility puzzle if you can’t make (or lose) money by trading on idiosyncratic-return volatility that isn’t predicted by the candidate variable:

(10)   \begin{align*} \text{``Everything Else'' term in Equation (6)} &= \frac{1}{N} \cdot \sum_n \left( \widetilde{\mathit{ivol}}_{n,t-1} - \gamma_t \cdot \widetilde{x}_{n,t-1} \right) \cdot r_{n,t}. \end{align*}

Note that this is a very particular definition of “explained”. Usually, when you think about an explanation, you have in mind some causal mechanism. But, that’s not what’s meant here. It’s not as if you could give some stocks higher idiosyncratic-return volatility and lower future returns by randomly assigning them higher maximum returns in the current month and not changing any of their other properties. Clearly, there is some deeper mechanism as play that’s causing both higher idiosyncratic-return volatility and higher maximum returns. But, you can’t trade on the part of idiosyncratic-return volatility that’s not explained by the maximum return.

Impulse-Response Functions for VARs

1. Motivating Example

If you regress the current quarter’s inflation rate, x_t, on the previous quarter’s rate using data from FRED over the period from Q3-1987 to Q4-2014, then you get the AR(1) point estimate,

(1)   \begin{align*} x_t = \underset{(0.09)}{0.31} \cdot x_{t-1} + \epsilon_t, \end{align*}

where the number in parentheses denotes the standard error, and the inflation-rate time series, x_t, has been demeaned. In other words, if the inflation rate is \mathrm{StD}(\epsilon_t) \approx 0.47{\scriptstyle \%} points higher in Q1-2015, then on average it will be 0.31 \times 0.47 \approx 0.14{\scriptstyle \%} points higher in Q2-2015, 0.31^2 \times 0.47 \approx 0.04{\scriptstyle \%} points higher in Q3-2015, and so on… The function that describes the cascade of future inflation-rate changes due to an unexpected 1\sigma shock in period t is known as the impulse-response function.

But, many interesting time-series phenomena involve multiple variables. For example, Brunnermeier and Julliard (2008) show that the house-price appreciate rate, y_t, is inversely related to the inflation rate. If you regress the current quarter’s inflation and house-price appreciation rates on the previous quarter’s rates using demeaned data from the Case-Shiller/S&P Index, then you get:

(2)   \begin{align*} \begin{bmatrix} x_t  \\ y_t \end{bmatrix} =  \begin{pmatrix} \phantom{-}\underset{(0.09)}{0.29} & \underset{(0.02)}{0.01} \\ -\underset{(0.43)}{0.40} & \underset{(0.09)}{0.50} \end{pmatrix} \begin{bmatrix} x_{t-1} \\ y_{t-1} \end{bmatrix} + \begin{pmatrix} \epsilon_{x,t}  \\ \epsilon_{y,t} \end{pmatrix}. \end{align*}

These point estimates indicate that, if the inflation rate were \mathrm{StD}(\epsilon_{x,t}) \approx 0.47{\scriptstyle \%} points higher in Q1-2015, then the inflation rate would be 0.29 \times 0.47 \approx 0.14{\scriptstyle \%} points higher in Q2-2015 and the house-price appreciation rate would be -0.40 \times 0.47 \approx -0.19{\scriptstyle \%} points lower in Q2-2015.

Computing the impulse-response function for this vector auto-regression (VAR) is more difficult than computing the same function for the inflation-rate AR(1) because the inflation rate and house-price appreciation rate shocks are correlated:

(3)   \begin{align*} \mathrm{Cor}(\epsilon_{x,t}, \epsilon_{y,t}) &= 0.13 \neq 0. \end{align*}

In other words, when you see a 1{\scriptstyle \%} point shock to inflation, you also tend to see a 0.13{\scriptstyle \%} point shock to the house-price appreciation rate. Thus, computing the future effects of a 1\sigma shock to the inflation rate and a 0{\scriptstyle \%} point shock to the house-price appreciation rate gives you information about a unit shock that doesn’t happen in the real world. In this post, I show how to account for this sort of correlation when computing the impulse-response function for VARs. Here is the relevant code.

2. Impulse-Response Function

Before studying VARs, let’s first define the impulse-response function more carefully in the scalar world. Suppose we have some data generated by an AR(1),

(4)   \begin{align*} x_t &= \gamma \cdot x_{t-1} + \epsilon_t, \end{align*}

where \mathrm{E}[x_t] = 0, \mathrm{E}[\epsilon_t] = 0, and \mathrm{Var}[\epsilon_t] = \sigma^2. For instance, if we’re looking at quarterly inflation data, x_t = \Delta \log \mathit{CPI}_t, then \gamma = 0.31. In this setup, what would happen if there was a sudden 1\sigma shock to x_t in period t? How would we expect the level of x_{t+1} to change? What about the level of x_{t+2}? Or, the level of any arbitrary x_{t+h} for h > 0? How would a \mathrm{StD}(\epsilon_t) \approx 0.47{\scriptstyle \%} point shock to the current inflation rate propagate into future quarters?

Well, it’s easy to compute the time t expectation of x_{t+1}:

(5)   \begin{align*} \mathrm{E}_t[x_{t+1}] &= \mathrm{E}_t \left[ \, \gamma \cdot x_t + \epsilon_{t+1} \, \right] \\ &= \gamma \cdot x_t. \end{align*}

Iterating on this same strategy then gives the time t expectation of x_{t+2}:

(6)   \begin{align*} \begin{split} \mathrm{E}_t[x_{t+2}]  &= \mathrm{E}_t \left[ \, \gamma \cdot x_{t+1} + \epsilon_{t+2} \, \right]  \\ &= \mathrm{E}_t \left[ \, \gamma \cdot \left\{ \, \gamma \cdot x_t + \epsilon_{t+1} \, \right\} + \epsilon_{t+2} \, \right]  \\ &= \gamma^2 \cdot x_t. \end{split} \end{align*}

So, in general, the time t expectation of any future x_{t+h} will be given by the formula,

(7)   \begin{align*} \mathrm{E}_t[x_{t+h}] &= \gamma^h \cdot x_t, \end{align*}

and the impulse-response function for the AR(1) process will be:

(8)   \begin{align*} \mathrm{Imp}(h) &= \gamma^h \cdot \sigma. \end{align*}

If you knew that there was a sudden shock to x_t of size \epsilon_t = +1\sigma, then your expectation of x_{t+h} would change by the amount \mathrm{Imp}(h). The figure below plots the impulse-response function for x_t using the AR(1) point estimate by Equation (1).

plot--cpi-ar1-irf

There’s another slightly different way you might think about an impulse-response function—namely, as the coefficients to the moving-average representation of the time series. Consider rewriting the data generating process using lag operators,

(9)   \begin{align*} \begin{split} \epsilon_t &= x_t - \gamma \cdot x_{t-1}  \\ &= (1 - \gamma \cdot \mathcal{L}) \cdot x_t, \end{split} \end{align*}

where \mathcal{L} x_t = x_{t-1}, \mathcal{L}^2 x_t = x_{t-2}, and so on… Whenever the slope coefficient is smaller than 1, |\gamma|<1, we know that (1 - \gamma)^{-1} = \sum_{h=0}^{\infty}\gamma^h, and there exists a moving-average representation of x_t:

(10)   \begin{align*} x_t  &= (1 - \gamma \cdot \mathcal{L})^{-1} \cdot \epsilon_t \\ &= \left( \, 1 + \gamma \cdot \mathcal{L} + \gamma^2 \cdot \mathcal{L}^2 + \cdots \, \right) \cdot \epsilon_t \\ &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot \epsilon_{t-\ell}. \end{align*}

That is, rather than writing each x_t as a function of a lagged value, \gamma \cdot x_{t-1}, and a contemporaneous shock, \epsilon_t, we can instead represent each x_t as a weighted average of all the past shocks that’ve been realized, with more recent shocks weighted more heavily.

(11)   \begin{align*} x_t &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot \epsilon_{t-\ell} \end{align*}

If we normalize all of the shocks to have unit variance, then the weights themselves will be given by the impulse-response function:

(12)   \begin{align*} x_t &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot (\sigma \cdot \sigma^{-1}) \cdot \epsilon_{t-\ell} \\ &= \sum_{\ell = 0}^{\infty} \mathrm{Imp}(\ell) \cdot (\sfrac{\epsilon_{t-\ell}}{\sigma}). \end{align*}

Of course, this is exactly what you’d expect for a covariance-stationary process. The impact of past shocks on the current realized value had better be the same as the impact of current shocks on future values.

3. From ARs to VARs

We’ve just seen how to compute the impulse-response function for an AR(1) process. Let’s now examine how to extend this the setting where there are two time series,

(13)   \begin{align*} x_t &= \gamma_{x,x} \cdot x_{t-1} + \gamma_{x,y} \cdot y_{t-1} + \epsilon_{x,t}, \\ y_t &= \gamma_{y,x} \cdot x_{t-1} + \gamma_{y,y} \cdot y_{t-1} + \epsilon_{y,t}, \end{align*}

instead of just 1. This pair of equations can be written in matrix form as follows,

(14)   \begin{align*} \underset{\mathbf{z}_t}{ \begin{bmatrix} x_t \\ y_t \end{bmatrix} } &= \underset{\mathbf{\Gamma}}{ \begin{pmatrix} \gamma_{x,x} & \gamma_{x,y} \\ \gamma_{y,x} & \gamma_{y,y} \end{pmatrix} } \underset{\mathbf{z}_{t-1}}{ \begin{bmatrix} x_{t-1} \\ y_{t-1} \end{bmatrix} } + \underset{\boldsymbol \epsilon_t}{ \begin{bmatrix} \epsilon_{x,t} \\ \epsilon_{y,t} \end{bmatrix} }, \end{align*}

where \mathrm{E}[ {\boldsymbol \epsilon}_t ] = \mathbf{0} and \mathrm{E}[ {\boldsymbol \epsilon}_t {\boldsymbol \epsilon}_t^{\top} ] = \mathbf{\Sigma}. For example, if you think about x_t as the quarterly inflation rate and y_t as the quarterly house-price appreciation rate, then the coefficient matrix \mathbf{\Gamma} is given in Equation (2).

Nothing about the construction of the moving-average representation of x_t demanded that x_t be a scalar, so we can use the exact same tricks to write the (2 \times 2)-dimensional vector \mathbf{z}_t as a moving average:

(15)   \begin{align*} \mathbf{z}_t = \sum_{\ell = 0}^{\infty} \mathbf{\Gamma}^{\ell} {\boldsymbol \epsilon}_{t-\ell}. \end{align*}

But, it’s much less clear in this vector-valued setting how we’d recover the impulse-response function from the moving-average representation. Put differently, what’s the matrix analog of \mathrm{StD}(\epsilon_t) = \sigma?

Let’s apply the want operator. This mystery matrix, let’s call it \mathbf{C}_x^{-1}, has to have two distinct properties. First, it’s got to rescale the vector of shocks, {\boldsymbol \epsilon}_t, into something that has a unit norm,

(16)   \begin{align*} 1 &= \mathrm{E}\left[ \, \Vert \mathbf{C}_x^{-1} {\boldsymbol \epsilon}_t \Vert_2 \, \right], \end{align*}

in the same way that \mathrm{StD}(\sfrac{\epsilon_t}{\sigma}) = 1 in the analysis above. This is why I’m writing the mystery matrix as \mathbf{C}_x^{-1} rather than just \mathbf{C}_x. Second, the matrix has to account for the fact that the shocks, \epsilon_{x,t} and \epsilon_{y,t}, are correlated, so that 1{\scriptstyle \%} point shocks to the inflation rate are always accompanied by 0.13{\scriptstyle \%} point shocks to the house-price appreciation rate. Because the shocks to each variable might have different standard deviations, for instance, \mathrm{StD}(\epsilon_{x,t}) = \sigma_x \approx 0.47{\scriptstyle \%} while \mathrm{StD}(\epsilon_{y,t}) = \sigma_y \approx 2.29{\scriptstyle \%}, the effect of a 1\sigma_x shock to the inflation rate on the house-price appreciation rate, 0.13 \times 0.47 \approx 0.06{\scriptstyle \%}, will be different than the effect of a 1\sigma_y shock to the house-price appreciation rate on the inflation rate, 0.13 \times 2.29 \approx 0.30{\scriptstyle \%}. Thus, each variable in the vector \mathbf{z}_t will have its own impulse-response function. This is why I write the mystery matrix as \mathbf{C}_x^{-1} rather than \mathbf{C}^{-1}.

It turns out that, if we pick \mathbf{C}_x to be the Cholesky decomposition of \mathbf{\Sigma},

(17)   \begin{align*} \mathbf{\Sigma} &= \mathbf{C}_x\mathbf{C}_x^\top, \end{align*}

then \mathbf{C}_x^{-1} will have both of the properties we want as pointed out in Sims (1980). The simple 2-dimensional case is really useful for understanding why. To start with, let’s write out the variance-covariance matrix of the shocks, \mathbf{\Sigma}, as follows,

(18)   \begin{align*} \mathbf{\Sigma} &= \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y \\ \rho \cdot \sigma_x \cdot \sigma_y & \sigma_y^2 \end{pmatrix} \end{align*}

where \rho = \mathrm{Cor}(\epsilon_{x,t},\epsilon_{y,t}). The Cholesky decomposition of \mathbf{\Sigma} can then be solved by hand:

(19)   \begin{align*} \overbrace{ \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y \\ \rho \cdot \sigma_x \cdot \sigma_y & \sigma_y^2 \end{pmatrix}}^{=\mathbf{\Sigma}} &= \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y  \\ \rho \cdot \sigma_x \cdot \sigma_y & \rho^2 \cdot \sigma_y^2 + (1 - \rho^2) \cdot \sigma_y^2 \end{pmatrix} \\ &= \underbrace{ \begin{pmatrix} \sigma_x & 0 \\ \rho \cdot \sigma_y & \sqrt{(1 - \rho^2) \cdot \sigma_y^2} \end{pmatrix}}_{=\mathbf{C}_x} \underbrace{ \begin{pmatrix} \sigma_x & \rho \cdot \sigma_y \\ 0 & \sqrt{(1 - \rho^2) \cdot \sigma_y^2} \end{pmatrix}}_{=\mathbf{C}_x^{\top}} \end{align*}

Since we’re only working with a (2 \times 2)-dimensional matrix, we can also solve for \mathbf{C}_x^{-1} by hand:

(20)   \begin{align*} \mathbf{C}_x^{-1}  &= \frac{1}{(1 - \rho^2)^{\sfrac{1}{2}} \cdot \sigma_x \cdot \sigma_y} \cdot \begin{pmatrix} (1 - \rho^2)^{\sfrac{1}{2}} \cdot \sigma_y & 0 \\ - \rho \cdot \sigma_y & \sigma_x \end{pmatrix} \\ &=  \begin{pmatrix} \frac{1}{\sigma_x} & 0 \\ - \frac{\rho}{(1 - \rho^2)^{\sfrac{1}{2}}} \cdot \frac{1}{\sigma_x} & \frac{1}{(1 - \rho^2)^{\sfrac{1}{2}}} \cdot \frac{1}{\sigma_y} \end{pmatrix} \end{align*}

So, for example, if there is a pair of shocks, {\boldsymbol \epsilon}_t = \begin{bmatrix} \sigma_x & 0 \end{bmatrix}^{\top}, then \mathbf{C}_x^{-1} will convert this shock into:

(21)   \begin{align*} \mathbf{C}_x^{-1}{\boldsymbol \epsilon}_t = \begin{bmatrix} 1 & - \rho \cdot (1 - \rho^2)^{-\sfrac{1}{2}} \end{bmatrix}^{\top}. \end{align*}

In other words, the matrix \mathbf{C}_x^{-1} rescales {\boldsymbol \epsilon}_t to have unit norm, \mathrm{E}\!\left[ \, \mathbf{C}_x^{-1} \, {\boldsymbol \epsilon}_t^{\phantom{\top}}\!{\boldsymbol \epsilon}_t^{\top} (\mathbf{C}_x^{-1})^{\top} \right] = \mathbf{I}, and rotates the vector to account for the correlation between \epsilon_{x,t} and \epsilon_{y,t}. To appreciate how the rotation takes into account the positive correlation between \epsilon_{x,t} and \epsilon_{y,t}, notice that matrix \mathbf{C}_x^{-1} turns the shock {\boldsymbol \epsilon}_t = \begin{bmatrix} \sigma_x & 0 \end{bmatrix}^{\top} into a vector that is pointing 1 standard deviation in the x direction and - \rho \cdot (1 - \rho^2)^{-\sfrac{1}{2}} in the y direction. That is, given that you’ve observed a positive 1\sigma_x shock, observing a 0\sigma_y shock would be a surprisingly low result.

If we plug \mathbf{C}_x^{-1} into our moving-average representation of \mathbf{z}_t, then we get the expression below,

(22)   \begin{align*} \mathbf{z}_t  &= \sum_{\ell = 0}^{\infty} \mathbf{\Gamma}^{\ell} (\mathbf{C}_x \mathbf{C}_x^{-1}) \, {\boldsymbol \epsilon}_{t-\ell}, \\ &= \sum_{\ell = 0}^{\infty} \mathrm{Imp}_x(\ell) \, \mathbf{C}_x^{-1} {\boldsymbol \epsilon}_{t-\ell}, \end{align*}

implying that the impulse-response function for x_t is given by:

(23)   \begin{align*} \mathrm{Imp}_x(h) &= \mathbf{\Gamma}^h \mathbf{C}_x. \end{align*}

The figure below plots the impulse-response function for both x_t and y_t implied by a unit shock to x_t using the coefficient matrix from Equation (2).

plot--cpi-vs-hpi-var-irf

Bias in Time-Series Regressions

1. Motivation

How persistent has IBM’s daily trading volume been over the last month? How persistent have Apple’s monthly stock returns been over the last 5 years of trading? What about the US’s annual GDP growth over the last century? To answer these questions, why not just run an OLS regression,

(1)   \begin{align*} x_t = \widehat{\alpha} + \widehat{\rho} \cdot x_{t-1} + \widehat{\epsilon}_t, \end{align*}

where \widehat{\rho} denotes the estimated auto-correlation of the relevant data series? The Gauss-Markov Theorem says that an OLS regression will give a consistent unbiased estimate of the persistence parameter, \rho, right?

Wrong.

Although OLS estimates are still consistent when using time-series data (i.e., they converge to the correct value as the number of observations increases), they are no longer unbiased in finite samples (i.e., they may be systematically too large or too small when looking at T = 100 rather than T \to \infty observations). To illustrate the severity of this problem, I simulate data of lengths T = 21, 60, and 100,

(2)   \begin{align*} x_t = 0 + \rho \cdot x_{t-1} + \epsilon_t  \qquad  \epsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1), \end{align*}

and estimate the simple auto-regressive model from Equation (1) to recover \rho. The figure below shows the results of this exercise. The left-most panel reveals that, when the true persistence parameter approaches one, \rho \nearrow 1, the bias approaches -0.20, or \sfrac{1}{5} of the true coefficient size. In other words, if you simulate a time series of T=21 data points using \rho = 1, then you’ll typically estimate a \widehat{\rho} = 0.80!

plot--ar1-coef-bias

What is it about time series data that induces the bias? Why doesn’t this problem exist in a cross-sectional regression? How can it exist even when the true coefficient is \rho = 0? This post answers these questions. All of the code can be found here.

2. Root of the Problem

Here is the short version. The bias in \widehat{\rho} comes from having to estimate the sample average of the time series:

(3)   \begin{align*} \widehat{\mu} &= \frac{1}{T} \cdot \sum_t x_t = \frac{1}{T} \cdot \sum_t \left( \frac{1 - \rho^{T - (t-1)}}{1 - \rho} \right) \cdot \epsilon_t. \end{align*}

If you knew the true mean, \mu, then there’d be no bias in \widehat{\rho}. Moreover, the bias goes away as you see more and more data (i.e., the estimator is consistent) because your estimated mean gets closer and closer to the true mean, \lim_{T \to \infty} (\widehat{\mu} - \mu)^2 = 0. Let’s now dig into why not knowing the mean of a time series induces a bias in OLS estimate of the slope.

Estimating the coefficients in Equation (1) means choosing the parameters \widehat{\mu} and \widehat{\rho} to minimize the mean squared error between the left-hand-side variable, x_t, and the right-hand-side variable, x_{t-1}:

(4)   \begin{align*} \min_{\{\widehat{\mu},\widehat{\rho}\}} \, \frac{1}{T} \cdot \sum_t \left( \, x_t - \left\{ \widehat{\mu} + \widehat{\rho} \cdot (x_{t-1} - \widehat{\mu}) \right\} \, \right)^2. \end{align*}

Any parameter choice, \widehat{\rho}, that minimizes this error must also satisfy the first-order condition below:

(5)   \begin{align*} 0 &=  - \, \frac{1}{T} \cdot \sum_t \left( \, x_t - \left\{ \widehat{\mu} + \widehat{\rho} \cdot (x_{t-1} - \widehat{\mu}) \right\} \, \right) \cdot (x_{t-1} - \widehat{\mu}). \end{align*}

Substituting in the true functional form, x_t = \mu + \rho \cdot x_{t-1} + \epsilon_t, then gives:

(6)   \begin{align*} 0 &=  - \, \frac{1}{T} \cdot \sum_t \left( \, \left\{\mu + \rho \cdot x_{t-1} + \epsilon_t \right\} - \left\{ \widehat{\mu} + \widehat{\rho} \cdot (x_{t-1} - \widehat{\mu}) \right\} \, \right) \cdot (x_{t-1} - \widehat{\mu}). \end{align*}

From here, it’s easy to solve for the expected difference between the estimated slope coefficient, \widehat{\rho}, and the true slope coefficient, \rho:

(7)   \begin{align*} \mathrm{E}\!\left[ \, \widehat{\rho} - \rho \, \right] &=  \mathrm{E} \left[ \, \frac{ \frac{1}{T} \cdot \sum_t ( \epsilon_t - \{\widehat{\mu} - \mu\} ) \cdot (x_{t-1} - \widehat{\mu}) }{ \frac{1}{T} \cdot \sum_t (x_{t-1} - \widehat{\mu})^2 } \, \right]. \end{align*}

Note that \widehat{\epsilon}_t = \epsilon_t - \{\widehat{\mu} - \mu\} is just the regression residual. So, this equation says that the estimated persistence parameter, \widehat{\rho}, will be too high if big \widehat{\epsilon}_t‘s tend to follow periods in which x_{t-1} is above its mean. Conversely, your estimated persistence parameter, \widehat{\rho}, will be too low if big \widehat{\epsilon}_t‘s tend to follow periods in which x_{t-1} is below its mean. How can estimating the time-series mean, \widehat{\mu}, induce this correlation while knowing the true mean, \mu, not?

Clearly, we need to compute the average of the time series given in Equation (3). For simplicity, let’s assume that the true mean is \mu = 0 and the initial value is x_0 = 0. Under these conditions, each successive term of the time series is just a weighted average of shocks:

(8)   \begin{align*} x_t &= \sum_{s=1}^t \rho^{t-s} \cdot \epsilon_s. \end{align*}

So, the sample average given in Equation (3) must contain information about future shock realizations, \{\epsilon_t, \epsilon_{t+1}, \ldots, \epsilon_T\}. Consider the logic when the true persistence parameter is positive, \rho > 0, and the true mean is zero, \mu = 0. If the current period’s realization of x_{t-1} is below the estimated mean, \widehat{\mu}, then future \epsilon_t‘s have to be above the estimated mean by definition—otherwise, \widehat{\mu} wouldn’t be the mean. Conversely, if the current period’s realization of x_{t-1} is above the estimated mean, then future \epsilon_t‘s have to be below the estimated mean. As a result, the sample covariance between (x_{t-1} - \widehat{\mu}) and (\epsilon_t - \widehat{\mu}) must be negative:

(9)   \begin{align*} 0 > \frac{1}{T} \cdot \sum_t (\epsilon_t - \widehat{\mu}) \cdot (x_{t-1} - \widehat{\mu}). \end{align*}

As a result, when the true slope parameter is positive, the OLS estimate will be biased downward.

3. Cross-Sectional Regressions

Cross-sectional regressions don’t have this problem because estimating the mean of the right-hand-side variable,

(10)   \begin{align*} \widehat{\mu}_x = \frac{1}{N} \cdot \sum_n x_n, \end{align*}

doesn’t tell you anything about the error terms. For example, imagine you had N data points generated by the following model,

(11)   \begin{align*} y_n = \mu_y + \beta \cdot (x_n - \mu_x) + \epsilon_n \qquad x_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(\mu_x, \sigma_x^2) \qquad \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0, \sigma_{\epsilon}^2). \end{align*}

where each observation of the right-hand-side variable, x_n, is independently drawn from the same distribution. In this setting, the slope coefficient from the associated cross-sectional regression,

(12)   \begin{align*} y_n &= \widehat{\mu}_y + \widehat{\beta} \cdot (x_n - \widehat{\mu}_x) + \widehat{\epsilon}_n, \end{align*}

won’t be biased because \widehat{\mu}_x isn’t a function of any of the error terms, \{\epsilon_1, \epsilon_2, \ldots, \epsilon_N\}:

(13)   \begin{align*} \left( \frac{1}{N} \cdot \sum_n x_n \right) \perp \epsilon_n. \end{align*}

So, estimating the mean won’t induce any covariance between the residuals, \widehat{\epsilon}_n = \epsilon_n - \{ \widehat{\mu}_y - \mu_y \}, and the right-hand-side variable, x_n - \widehat{\mu}_x. All the conditions of the Gauss-Markov Theorem hold. If you only have a small number of observations, then \widehat{\beta} may be a noisy estimate, but at least it will be unbiased.

4. Bias At Zero

One of the more interesting things about the slope coefficient bias in time series regressions is that it doesn’t disappear when the true parameter value is \rho = 0. For instance, in the figure above, notice that the expected bias disappears at \rho = -\sfrac{1}{3} and negative at \rho = 0. Put differently, if you estimated the correlation between Apple’s returns in successive months and found a parameter value of \widehat{\rho} = -0.03, then the true coefficient is likely \rho = 0. In fact, Kendall (1954) derives an approximate expression for this bias when the true data generating process is an \mathrm{AR}(1):

(14)   \begin{align*} \mathrm{E}[ \, \widehat{\rho} - \rho \, ] &= - \, \frac{1 + 3 \cdot \rho}{T} \end{align*}

A simple two-period example illustrates why this is so. Imagine a world where the true coefficient is \rho = 0, and you see the pair of data points in the left panel of the figure below, with the first observation lower than the second. If \mu = 0 as well, then we have that:

(15)   \begin{align*} \widehat{\mu} = \sfrac{x_1}{2} + \sfrac{x_2}{2} = \sfrac{\epsilon_1}{2} + \sfrac{\epsilon_2}{2}. \end{align*}

I plot this sample mean with the green dashed line. In the right panel of the figure, I show the distances (x_1 - \widehat{\mu}) and (\epsilon_2 - \widehat{\mu}) in red and blue respectively. Clearly, if the first observation, x_1, is below the line, then the second observation is above the line. But, since \rho = 0, the second observation is just x_2 = \epsilon_2, so you will observe a negative correlation between (x_1 - \widehat{\mu}) and (\epsilon_2 - \widehat{\mu}). \widehat{\rho} will be downward biased.

bias-in-ols-slope

When Can Arbitrageurs Identify a Sporadic Pricing Error?

1. Motivation

Imagine you’re an arbitrageur and you see a sequence of abnormal returns:

(1)   \begin{align*} \mathit{ra}_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \begin{cases}  +1 &\text{w/ prob } \sfrac{1}{2} \cdot (1 + \alpha) \\ -1 &\text{w/ prob } \sfrac{1}{2} \cdot (1 - \alpha) \end{cases} \qquad \text{with} \qquad \alpha \in [-1,1] \end{align*}

Here, \alpha denotes the stock’s average abnormal returns, so the stock’s mispriced if \alpha \neq 0. Suppose you don’t initially know whether or not the stock is priced correctly, whether or not \alpha = 0, but as you see more and more data you refine your beliefs, \widehat{\alpha}_T. In fact, your posterior variance disappears as T \to \infty:1

(2)   \begin{align*} \mathrm{Var}(\widehat{\alpha}_T - \alpha) \asymp T^{-1} \end{align*}

So, such pricing errors can’t persist forever unless there’s some limit to arbitrage, like trading costs or short-sale constraints, to keep you from trading it away.

However, pricing errors often don’t persist period after period. Instead, they tend to arrive sporadically, affecting the first period’s returns, skipping the next two periods’ returns, affecting the fourth and fifth periods’ returns, and so on… What’s more, arbitrageurs don’t have access to an oracle. They don’t know ahead of time which periods are affected and which aren’t. They have to figure this information out on the fly, in real time. In this post, I show that arbitrageurs with an infinite time series may never be able to identify a sporadic pricing error because they don’t know ahead of time where to look.

2. Sporadic Errors

What does it mean to say that a pricing error is sporadic? Suppose that, in each trading period, there is a key state variable, s_t \in \{0,1\}. If s_t = 0, then the stock’s abnormal returns are drawn from a distribution with \alpha = 0; whereas, if s_t = 1, then the stock’s abnormal returns are drawn from a distribution with \alpha \neq 0. Let \theta_t denote the probability that s_t = 1 in a given trading period:

(3)   \begin{align*} \theta_t &= \mathrm{Pr}\left[ \, s_t = 1 \, \middle| \, s_0 = 1 \, \right] \end{align*}

So, \theta_t = 1 for every t \geq 0 in the example above where the stock always has a mean abnormal return of \alpha. By contrast, if \theta_t < 1 in every trading period, then the pricing error is sporadic.

plot--renewal-process

You can think about this state variable in a number of ways. For instance, perhaps the market conditions have to be just right for the arbitrage opportunity to exist. In the figure above, I model the distance to this Goldilocks zone as a reflected random walk, x_t:

(4)   \begin{align*} x_t &=  \begin{cases} x_{t-1} + 1 &\text{w/ prob } 52{\scriptstyle \%} \\ \max\{x_{t-1} - 1, 0\}  &\text{w/ prob } 48{\scriptstyle \%} \end{cases} \end{align*}

Then, every time x_t hits the origin, the mispricing occurs. Thought of in this way, the state variable represents a renewal process.

3. Inference Problem

Suppose you see an infinitely long time series of abnormal returns. Let f_0(\{ \mathit{ra}_t \}) denote the distribution of abnormal returns when \alpha = 0 always, and let f_a(\{ \mathit{ra}_t \}) denote the distribution of abnormal returns when \alpha \neq 0 sometimes. So, if abnormal returns are drawn from f_0, then there are no pricing errors; whereas, if abnormal returns are drawn from f_a, then there are some pricing errors. Here’s the question: When can you conclusively tell whether the data was drawn from f_0 rather than f_a?

If a trader can perfectly distinguish between the pair of probability distributions, f_0 and f_a, then, when you give him any randomly selected sequence of abnormal returns, \{ \mathit{ra}_t \}, he will look at it and go, “That’s from distribution f_0.”, or “That’s from distribution f_a.” He will never be stumped. He will never need more information. Mutual singularity is the mathematical way of phrasing this simple idea. Let \Omega denote the set of all possible infinite abnormal return sequences:

(5)   \begin{align*} \Omega &= \left\{ \, \{ \mathit{ra}_t \}, \, \{ \mathit{ra}_t' \}, \, \{ \mathit{ra}_t'' \}, \, \ldots \,  \right\} \end{align*}

We say that a pair of distributions, f_0 and f_a, are mutually singular if there exist disjoint sets, \Sigma_0 and \Sigma_a, whose union is \Omega such that f_0(\{ \mathit{ra}_t \}) = 0 for all sequences \{ \mathit{ra}_t \} \in \Sigma_a while f_a(\{ \mathit{ra}_t \}) = 0 for all sequences \{ \mathit{ra}_t \} \in \Sigma_0. If f_0 and f_a are mutually singular then we can write f_0 \perp f_a. For example, if the alternative hypothesis is that \alpha = 0.10 every day, then f_0 \perp f_a since we know that all abnormal return sequences drawn from f_0 have a mean of exactly 0 as T \to \infty while all abnormal return sequences drawn from f_a have a mean of exactly \alpha = 0.10 as T \to \infty.

At the other extreme, you could imagine a trader being completely unable to tell a pair of distributions apart. It might be the case that any sequence of abnormal returns that is off limits in distribution f_0 is also off limits in distribution f_a. That is, you might never be able to find a sequence of returns that you could use to reject the null hypothesis. Absolute continuity is the mathematical way of phrasing this idea. A distribution f_a is absolutely continuous with respect to f_0 if f_0(\{ \mathit{ra}_t \}) = 0 implies that f_a(\{ \mathit{ra}_t \})=0. This is written as f_0 \gg f_a.

In this post I want to know: for what kind of sporadic pricing errors is f_a \gg f_0? When can a trader never be completely sure that he’s seen an pricing error and not just an unlikely set of market events?

4. Main Results

Now for the two main results. Harris and Keane (1997) show that, i) if the pricing error happens frequently enough, then traders can identify it regardless of how large it is:

(6)   \begin{align*} \sum_{t=0}^{\infty} \theta_t^2 = \infty  \qquad \Rightarrow \qquad f_0 \perp f_a \end{align*}

For instance, suppose that a stock’s abnormal returns are drawn from a distribution with a really small \alpha > 0 every single period (i.e., \theta_t = 1 for all t \geq 0). Then, just like standard statistical intuition would suggest, traders with enough data will eventually identify this tiny pricing error since \sum_{t=0}^\infty 1 = \infty.

By contrast, ii) if the pricing error is small and rare enough, then traders with an infinite amount of data will never be able to reject f_0. They will never be able to conclusively know that there was a pricing error:

(7)   \begin{align*}  \sum_{t=0}^{\infty} \theta_t^2 &< \sfrac{1}{\alpha^2} \qquad \Rightarrow \qquad f_0 \gg f_a \end{align*}

Again, f_0 \gg f_a means that traders can’t find a sequence of abnormal returns which would only have been possible under f_a. This is a bit of a strange result. To illustrate, suppose that you and I both see a suspicious abnormal return at time t=0. But, while you think it’s due to a sporadic arbitrage opportunity of size \alpha, I think it was just a random market fluctuation. If the probability that this pricing error recurs shrinks over time,

(8)   \begin{align*} \theta_t \asymp \sfrac{1}{(\alpha \cdot \sqrt{t})} \end{align*}

then no amount of additional data will enable us to conclusively settle our argument. There can be no smoking gun. We’ll just have to agree to disagree. This result runs against the standard Harsanyi doctrine which says that people who see the same information will end up with the same beliefs.

5. Proof Sketch

I conclude by sketching the proof for part ii) of Harris and Keane (1997)‘s main result: if the pricing error is sufficiently small and rare, then traders will never be able to reject f_0. To do this, I need to be able to show that, if \sum \theta_t^2 < \sfrac{1}{\alpha^2}, then every sequence of abnormal returns with the property that f_0(\{ \mathit{ra}_t \}) = 0 also has the property that f_a(\{ \mathit{ra}_t \}) = 0. The easiest way to do this is to look at the behavior of the following integral:

(9)   \begin{align*} \int_\Omega \left( \, \frac{f_a(\{ \mathit{ra}_t \})}{f_0(\{ \mathit{ra}_t \})} \, \right)^2 \cdot dF_0(\{ \mathit{ra}_t \}) \end{align*}

If it’s finite, then every time f_0(\{ \mathit{ra}_t \}) = 0 it must also be the case that f_a(\{ \mathit{ra}_t \}) = 0. Otherwise, you’d be dividing a positive number, f_a(\{ \mathit{ra}_t \})^2, by 0.

So, let’s examine this integral. At its core, this integral is just a weighted average of the number of times that a mispricing should occur under f_a since dF_0 = f_0:

(10)   \begin{align*} \int_\Omega \left( \, \frac{f_a(\{ \mathit{ra}_t \})}{f_0(\{ \mathit{ra}_t \})} \, \right)^2 \cdot dF_0(\{ \mathit{ra}_t \})  &= \int_{\{0,1\}^\infty} \prod_{t=1}^\infty (1 + \alpha^2 \cdot s_t^2 ) \cdot d\Theta(\mathbf{s}) \end{align*}

If we define J as the number of periods in which there is a pricing error,

(11)   \begin{align*} J &= \sum_{t=1}^\infty s_t^2, \end{align*}

then we can further bound this integral as follows since s_t^2 \in \{0,1\}:

(12)   \begin{align*} \int_{\{0,1\}^\infty} \prod_{t=1}^\infty (1 + \alpha^2 \cdot s_t^2 ) \cdot d\Theta(\mathbf{s}) &\leq \sum_{j=1}^\infty (1 + \alpha^2)^j \cdot \mathrm{Pr}[J = j] &\leq \sum_{j=1}^\infty (1 + \alpha^2)^j \cdot \mathrm{Pr}[J > 1]^{j-1} \end{align*}

However, we know that the probability that there is at least 1 period where a pricing error occurs is just the inverse of the expected number of periods in which a pricing error occurs:

(13)   \begin{align*} \frac{1}{\mathrm{Pr}[J > 1]} = \sum_{t=0}^\infty \theta_t^2 \end{align*}

So, we have our desired result. That is, f_0 \gg f_a whenever:

(14)   \begin{align*} 1 + \alpha^2 < \sum_{t=0}^\infty \theta_t^2 \end{align*}

  1. e.g., see the computation for the beta-binomial model.

Why Not Fourier Methods?

1. Motivation

There are many ways that you might measure the typical horizon of a stock’s demand shocks. For instance, Fourier methods might at first appear to be a promising approach, but first impressions can be deceiving. Here’s why: spikes in trading volume tend to be asynchronous. For example, you might see a 1-hour burst of trading activity starting at 9:37am, then another starting at 11:03am, and a third at 2:42pm, but you’d never see bursts of trading activity arriving and subsiding every hour like clockwork. Wavelet methods, as described in my earlier post, can handle these kinds of asynchronous shocks. Fourier methods can’t.

2. Spectral Analysis

Suppose there’s a minute-by-minute trading-volume time series that realizes hour-long shocks. To recover the 1-hour horizon from this time series, Fourier analysis tells us to estimate a collection of regressions at frequencies ranging from 1 cycle per month to 1 cycle per minute:

(1)   \begin{align*}   \mathit{vlm}_t - \langle \mathit{vlm}_t\rangle &= \alpha_f \cdot \sin(\pi \cdot f \cdot t) + \beta_f \cdot \cos(\pi \cdot f \cdot t) + \varepsilon_t,   \qquad \text{with } f \in [\sfrac{1}{22},390] \end{align*}

A frequency of f = 390 cycles per day, for example, denotes the 1-minute time horizon since there are 6.5 \, \sfrac{\mathrm{hr}}{\mathrm{day}} \times 60 \, \sfrac{\mathrm{min}}{\mathrm{hr}} = 390 minutes in a trading day. The amount of variation at a particular frequency is then proportional to the power of that frequency, S_f, defined as:

(2)   \begin{align*}   S_f &=  \frac{\alpha_f^2 + \beta_f^2}{2} \end{align*}

If the time series realizes hour-long shocks, then shouldn’t we find a peak in the power of the series at the hourly horizon, f = 6.5? Isn’t this what computing the power of a time series at a particular frequency is designed to capture?

3. Asynchronous Shocks

Yes, but only if the shocks come at regular intervals. For instance, if the first 60 minutes realized a positive shock, minutes 61 through 120 realized a negative shock, minutes 121 through 180 realized a positive shock again, and so on… then Fourier analysis would be the right approach. However, trading-volume shocks have irregular arrival times and random signs. Fourier analysis can’t handle this sort of asynchronous structure.

4. Simulation-Based Example

Let’s consider a short example to solidify this point. We simulate a month-long time series of minute-by-minute trading-volume data with 60-minute shocks by first randomly selecting J = 1000 minutes during which hour-long jumps begin, \{ \tau_1, \tau_2, \tau_3, \cdots, \tau_J\}, and then adding white noise, \varepsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1):

(3)   \begin{align*}   \mathit{vlm}_t - \langle\mathit{vlm}_t\rangle &= \sigma \cdot \varepsilon_t + \sum_{j=1}^{1000} \lambda \cdot 1_{\{ t \in [\tau_j,\tau_j + 59] \}}   \cdot    \begin{cases}      +1 &\text{w/ prob. } 50{\scriptstyle \%}     \\     -1 &\text{w/ prob. } 50{\scriptstyle \%}   \end{cases} \end{align*}

Each of the jumps has a magnitude of \lambda = 0.05 \times 10^4 \, \sfrac{\mathrm{sh}}{\mathrm{min}} and is equally likely to be positive or negative. The white noise has a standard deviation of \sigma = 0.06 \times 10^4 \, \sfrac{\mathrm{sh}}{\sqrt{\mathrm{min}}}. The resulting series is shown in the left-most panel of the figure below.

plot--wavelet-variance-illustration--17dec2014

By construction, this process only has white noise and 60-minute-long shocks. That’s it. There are no other time scales to worry about. None. If Fourier analysis were the correct tool for identifying horizon-specific trading-volume fluctuations, then you’d expect there to be a spike in the power of the time series at the 60-minute horizon. But, what happens if we look for evidence of this 60-minute timescale by estimating the power spectrum shown in the middle panel of the figure above? Do we see any evidence of a 60-minute shock? No. There is nothing at the 60-minute horizon. Asynchronous shocks of a fixed length don’t show up in the Fourier power spectrum. They do, however, show up in the Wavelet-variance plot as shown in the right-most panel of the figure above where there is a clear spike at the 1-hour horizon.