Wavelet Variance

1. Motivation

Imagine you’re a trader who’s about to put on a position for the next month. You want to hedge away the risk in this position associated with daily fluctuations in market returns. One way that you might do this would be to short the S&P 500 since E-mini contracts are some of the most liquid in the world.

plot--sp500-price-volume--24jul2014

Flash_CrashBut… how much of the variation in the index’s returns is due to fluctuations at the daily horizon? e.g., the blue line in the figure to the right shows the minute-by-minute price of the E-mini contract on May 6th, 2010 during the flash crash. Over the course of 4 minutes, the contract price fell 3{\scriptstyle \%}! It then rebounded back to nearly its original position over the next hour. Clearly, if most of the fluctuations in the E-mini S&P 500 contract value is due to shocks on the sub-hour time scale, this contract will do a poor job hedging away daily market risk.

This post demonstrates how to decompose the variance of a time series (e.g., the minute-by-minute returns on the E-mini) into horizon specific components using wavelets. i.e., using the wavelet variance estimator allows you to ask the questions: “How much of the variance is coming from fluctuations on the scale of 16 minutes? 1 hour? 1 day? 1 month?” I then investigate how this wavelet variance approach compares to other methods financial economists might employ such as auto-regressive functions and spectral analysis.

2. Wavelet Analysis

In order to explain how the wavelet variance estimator works, I first need to give a quick outline of how wavelets work. Wavelets allow you to decompose a signal into components that are independent in both the time and frequency domains. This outline will be as bare bones as possible. See Percival and Walden (2000) for an excellent overview of the topic.

Imagine you’ve got a time series of just T = 8 returns:

(1)   \begin{align*} \mathbf{r} = \begin{bmatrix} r_0 & r_1 & r_2 & r_3 & r_4 & r_5 & r_6 & r_7 \end{bmatrix}^{\top} \end{align*}

and assume for simplicity that these returns have mean \mathrm{E}[r_t] = \mu_r = 0. One thing that you might do with this time series is estimate a regression with time fixed effects: r_t = \sum_{t'=0}^7 \vartheta_{t'} \cdot 1_{\{\mathrm{Time}(r_t) = t'\}}. Here is another way to represent the same regression:

(2)   \begin{align*} \begin{bmatrix} r_0 \\ r_1 \\ r_2 \\ r_3 \\ r_4 \\ r_5 \\ r_6 \\ r_7 \end{bmatrix} &= \begin{pmatrix}  1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\  0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\  0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix} \begin{bmatrix} \vartheta_0 \\ \vartheta_1 \\ \vartheta_2 \\ \vartheta_3 \\ \vartheta_4 \\ \vartheta_5 \\ \vartheta_6 \\ \vartheta_7 \end{bmatrix} \end{align*}

It’s really a trivial projection since \vartheta_t = r_t. Call the projection matrix \mathbf{F} for “fixed effects” sot that \mathbf{r} = \mathbf{F}{\boldsymbol \vartheta}.

Obviously, the above time fixed effect model would be a bit of a silly thing to estimate, but notice that the projection matrix \mathbf{F} has an interesting property. Namely, each column is orthonormal:

(3)   \begin{align*}  \langle \mathbf{f}(t) | \mathbf{f}(t') \rangle = \begin{cases} 1 &\text{if } t = t' \\ 0 &\text{else } \end{cases} \end{align*}

It’s orthogonal because \langle \mathbf{f}(t) | \mathbf{f}(t') \rangle = 0 unless t = t'. This requirement implies that each column in the projection matrix is picking up different information about \mathbf{r}. It’s normal because \langle \mathbf{f}(t) | \mathbf{f}(t) \rangle is normalized to equal 1. This requirement implies that the projection matrix is leaving the magnitude of \mathbf{r} unchanged. The time fixed effects projection matrix, \mathbf{F}, compares each successive time period, but you can also think about using other orthonormal bases.

e.g., the Haar wavelet projection matrix compares how the 1st half of the time series differs from the 2nd half, how the 1st quarter differs from the 2nd quarter, how the 3rd quarter differs from the 4th quarter, how the 1st eighth differs from the 2nd eighth, and so on… For the 8 period return time series, let’s denote the columns of the wavelet projection matrix as:

(4)   \begin{align*} \mathbf{w}(3,0) &= \sfrac{1}{\sqrt{8}} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{bmatrix}^{\top} \\ \mathbf{w}(2,0) &= \sfrac{1}{\sqrt{8}} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 & -1 & -1 & -1 & -1 \end{bmatrix}^{\top} \\ \mathbf{w}(1,0) &= \sfrac{1}{\sqrt{4}} \cdot \begin{bmatrix} 1 & 1 & -1 & -1 & 0 & 0 & 0 & 0 \end{bmatrix}^{\top} \\ \mathbf{w}(1,1) &= \sfrac{1}{\sqrt{4}} \cdot \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & 1 & -1 & -1 \end{bmatrix}^{\top} \\ \mathbf{w}(0,0) &= \sfrac{1}{\sqrt{2}} \cdot \begin{bmatrix} 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}^{\top} \\ \mathbf{w}(0,1) &= \sfrac{1}{\sqrt{2}} \cdot \begin{bmatrix} 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 \end{bmatrix}^{\top} \\ \mathbf{w}(0,2) &= \sfrac{1}{\sqrt{2}} \cdot \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 \end{bmatrix}^{\top} \\ \mathbf{w}(0,3) &= \sfrac{1}{\sqrt{2}} \cdot \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 \end{bmatrix}^{\top} \end{align*}

and simple inspection shows that each column is orthonormal:

(5)   \begin{align*}  \langle \mathbf{w}(h,i) | \mathbf{w}(h',i') \rangle = \begin{cases} 1 &\text{if } h = h', \; i = i' \\ 0 &\text{else } \end{cases} \end{align*}

Let’s look at a concrete example. Suppose that we want to project the vector:

(6)   \begin{align*} \mathbf{r} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}^{\top} \end{align*}

onto the wavelet basis:

(7)   \begin{align*} \begin{bmatrix} r_0 \\ r_1 \\ r_2 \\ r_3 \\ r_4 \\ r_5 \\ r_6 \\ r_7 \end{bmatrix} &= \begin{pmatrix}  \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{8}}  & \sfrac{1}{\sqrt{4}}  & 0 & \sfrac{1}{\sqrt{2}} & 0 & 0 & 0 \\  \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{8}}  & \sfrac{1}{\sqrt{4}}  & 0 & -\sfrac{1}{\sqrt{2}} & 0 & 0 & 0 \\  \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{8}}  & -\sfrac{1}{\sqrt{4}} & 0 & 0 & \sfrac{1}{\sqrt{2}} & 0 & 0 \\  \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{8}}  & -\sfrac{1}{\sqrt{4}} & 0 & 0 & -\sfrac{1}{\sqrt{2}} & 0 & 0 \\  \sfrac{1}{\sqrt{8}} & -\sfrac{1}{\sqrt{8}} & 0 & \sfrac{1}{\sqrt{4}}  & 0 & 0 & \sfrac{1}{\sqrt{2}} & 0 \\  \sfrac{1}{\sqrt{8}} & -\sfrac{1}{\sqrt{8}} & 0 & \sfrac{1}{\sqrt{4}}  & 0 & 0 & -\sfrac{1}{\sqrt{2}} & 0 \\  \sfrac{1}{\sqrt{8}} & -\sfrac{1}{\sqrt{8}} & 0 & -\sfrac{1}{\sqrt{4}} & 0 & 0 & 0 & \sfrac{1}{\sqrt{2}} \\  \sfrac{1}{\sqrt{8}} & -\sfrac{1}{\sqrt{8}} & 0 & -\sfrac{1}{\sqrt{4}} & 0 & 0 & 0 & -\sfrac{1}{\sqrt{2}} \end{pmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \theta_3 \\ \theta_4 \\ \theta_5 \\ \theta_6 \\ \theta_7 \end{bmatrix} \end{align*}

What would the wavelet coefficients {\boldsymbol \theta} look like? Well, a little trial and error shows that:

(8)   \begin{align*} {\boldsymbol \theta} = \begin{bmatrix} \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{8}} & \sfrac{1}{\sqrt{4}} & 0 & \sfrac{1}{\sqrt{2}} & 0 & 0 & 0 \end{bmatrix}^{\top} \end{align*}

since this is the only combination of coefficients that satisfies both r_0 = 1:

(9)   \begin{align*} 1 &=  r_0 \\ &= \frac{1}{\sqrt{8}} \cdot w_0(3,0) + \frac{1}{\sqrt{8}} \cdot w_0(2,0) + \frac{1}{\sqrt{4}} \cdot w_0(1,0) + \frac{1}{\sqrt{2}} \cdot w_0(0,0) \\ &= \frac{1}{8} + \frac{1}{8} + \frac{1}{4} + \frac{1}{4} \end{align*}

and r_t = 0 for all t > 0.

What’s cool about the wavelet projection is that the coefficients represent effects that are isolated in both the frequency and time domains. The index h=0,1,2,3 denotes the \log_2 length of the wavelet comparison groups. e.g. the 4 wavelets with h=0 compare 2^0 = 1 period increments: the 1st period to the 2nd period, the 3rd period to the 4th period, and so on… Similarly, the wavelets with h=1 compare 2^1 = 2 period increments: the 1st 2 periods to the 2nd 2 periods and the 3rd 2 periods to the 4th 2 periods. Thus, the h captures the location of the coefficient in the frequency domain. The index i=0,\ldots,I_h signifies which comparison groups at horizon h we are looking at. e.g., when h=0, there are I_0 = 4 = \sfrac{8}{2^{0+1}} different comparisons to be made. Thus, the i captures the location of the coefficient in the time domain.

3. Wavelet Variance

With these basics in place, it’s now easy to define the wavelet variance of a time series. First, I massage the standard representation of a series’ variance a bit. The variance of our 8 term series is defined as:

(10)   \begin{align*}  \sigma_r^2 &= \frac{1}{T} \cdot \sum_t r_t^2  \end{align*}

since \mu_r = 0. Using the tools from the section above, let’s rewrite \mathbf{r} = \mathbf{W}{\boldsymbol \theta}. This means that the variance formula becomes:

(11)   \begin{align*}  \sigma_r^2 &= \frac{1}{T} \cdot \mathbf{r}^{\top} \mathbf{r} =  \frac{1}{T} \cdot \left( \mathbf{W} {\boldsymbol \theta} \right)^{\top} \left( \mathbf{W} {\boldsymbol \theta} \right) \end{align*}

But I know that \mathbf{W}^{\top} \mathbf{W} = \mathbf{I} since each of the columns is orthonormal. Thus:

(12)   \begin{align*}  \sigma_r^2 &= \frac{1}{T} \cdot {\boldsymbol \theta}^{\top} {\boldsymbol \theta} = \frac{1}{T} \cdot \sum_{h,i} \theta(h,i)^2 \end{align*}

This representation gives the variance of a series as an average of squared wavelet coefficients.

The sum of the squared wavelet coefficients at each horizon, h, is then an interesting object:

(13)   \begin{align*} V(h) &= \frac{1}{T} \cdot \sum_{i=0}^{I_h} \theta(h,i)^2 \end{align*}

since V(h) denotes the fraction of the total variance of the time series explained by comparing successive periods of length 2^h. I refer to V(h) as the wavelet variance of a series at horizon h. The sum of the wavelet variances at each horizon gives total variance:

(14)   \begin{align*} \sum_{h=0}^H V(h) &= \sigma_r^2 \end{align*}

4. Numerical Example

Let’s take a look at how the wavelet variance of a time series behaves out in the wild. Here’s the code I used to create the figures: . Specifically, let’s study the simulated data plotted below which consists of 63 days of minute-by-minute return data with day-specific shocks:

(15)   \begin{align*} r_t &= \mu_{r,t} + \sigma_r \cdot \epsilon_t \qquad \text{with} \qquad \epsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1) \end{align*}

where the volatility of the process is given by \sigma_r = 0.01{\scriptstyle \mathrm{bp}/\sqrt{\mathrm{min}}} and there is a 5{\scriptstyle \%} probability of realizing a \mu_{r,t} = \pm 0.001{\scriptstyle \mathrm{bp}/\mathrm{min}} shock on any given day. The 4 days on which the data realized a shock are highlighted in red. These minute-by-minute figures amount to a 0{\scriptstyle \%/\mathrm{yr}} annualized return and a 31{\scriptstyle \%/\mathrm{yr}} annualized volatility.

plot--why-use-wavelet-variance--daily-shocks--time-series--25jul2014

The figure below then plots the wavelet coefficients, {\boldsymbol \theta}, at each horizon associated with this time series. A trading day is 6.5 \times 60 = 390{\scriptstyle \mathrm{min}}, so notice the spikes in the coefficient values in the h=6,7,8 panels near the day-specific shock dates corresponding to comparing successive 64, 128, and 256 minute intervals. The remaining variation in the coefficient levels comes from the underlying white noise process \epsilon_t. Because the break points in the wavelet projection affect the estimated coefficients, each data point in the plot actually represents the average of the coefficient estimates \theta_t(h,i) at a given point of time for all possible starting dates. See Percival and Walden (2000, Ch. 5) on the maximal overlap discrete wavelet transform for details.

plot--why-use-wavelet-variance--daily-shocks--wavelet-coefficients--25jul2014

Finally, I plot the \log of the wavelet variance at each horizon h for both the simulated return process (red) and a white noise process with an identical mean and variance (blue). Note that I’ve switched from \log_2 to \log_e on the x-axis here, so a spike in the amount of variance at h=6 corresponds to a spike in the amount of variance explained by successive e^{6} \approx 400{\scriptstyle \mathrm{min}} increments. This is exactly what you’d expect for day-specific shocks which have a duration of 390{\scriptstyle \mathrm{min}} as indicated by the vertical gray line. The wavelet variance of an appropriately scaled white noise process gives a nice comparison group. To see why, note that for covariance stationary processes like white noise, the wavelet variance at a particular horizon is related to the power spectrum as follows:

(16)   \begin{align*} V(h) &\approx 2 \cdot \int_{\sfrac{1}{2^{h+1}}}^{\sfrac{1}{2^h}} S(f) \cdot df \end{align*}

Thus, the wavelet variance of white noise should follow a power law with:

(17)   \begin{align*} V(h) &\propto 2^{-h} \end{align*}

giving a nice smooth reference point in plots.

plot--why-use-wavelet-variance--daily-shocks--wavelet-variance--25jul2014

5. Comparing Techniques

I conclude by considering how the wavelet variance statistic compares to other ways that a financial economist might look for horizon specific effects in data. I consider 2 alternatives: auto-regressive models and spectral density estimators. First, consider estimating the auto-regressive model below with lags \ell = 1,2,\ldots,L:

(18)   \begin{align*} r_t &= \sum_{\ell=1}^L C(\ell) \cdot r_{t-\ell} + \xi_t \qquad \text{where} \qquad \xi_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_{\xi}^2) \end{align*}

The left-most panel of the figure below reports the estimated values of C(\ell) for lags \ell = 1,2,\ldots,420 using the simulated data (red) as well as a scaled white noise process (blue). Just as before, the vertical grey line denotes the number of minutes in a day. There is no meaningful difference between the 2 sets of coefficients. The reason is that the day-specific shocks are asynchronous. They aren’t coming at regular intervals. Thus, no obvious lag structure can emerge from the data.

plot--why-use-wavelet-variance--daily-shocks--analysis--25jul2014

Next, let’s think about estimating the spectral density of \mathbf{r}. This turns out to be the exact same exercise as the auto-regressive model estimation in different clothing. As shown in an earlier post, it’s possible to flip back and forth between the coefficients of an \mathrm{AR}(L) process and its spectral density via the relationship:

(19)   \begin{align*} S(f) &= \frac{\sigma_{\epsilon}^2}{\left( \, 1 - \sum_{\ell=1}^L C(\ell) \cdot e^{-i \cdot 2 \cdot \pi \cdot f \cdot \ell} \, \right)^2} \end{align*}

This one-to-one mapping between the frequency domain and the time domain for covariance stationary processes is known as the Wiener–Khinchin theorem with \sigma_x^2 = \int_{-\sfrac{1}{2}}^{\sfrac{1}{2}} S(f) \cdot df. Thus, the spectral density plot just reflects the same random noise as the auto-regressive model coefficients because of the same issue with asynchrony. The most interesting features of the middle panel occur at really high frequencies which have nothing to do with the day-specific shocks.

Here’s the punchline. The wavelet variance is the only estimator of the 3 which can identify horizon-specific contributions to a time series’ variance which are not stationary.

WSJ Article Subject Tags

plot--autoregressive-coefficients--asynchronous-shocks--05jun2014

1. Motivation This post investigates the distribution of subject tags for Wall Street Journal articles that mention S&P 500 companies. e.g., a December 2009 article entitled, When Even Your Phone Tells You You're Drunk, It's Time to Call a … [Continue reading]

Randomized Market Trials

identical-houses

1. Motivation How much can traders learn from past price signals? It depends on what kind of assets sell. Suppose that returns are (in part) a function of $K = \Vert {\boldsymbol \alpha} \Vert_{\ell_0}$ different feature-specific … [Continue reading]