Archive - Uncategorized RSS Feed

Factors vs. Characteristics

1. Introduction

Fama and French (1993) found that both a firm’s size and its book-to-market ratios are highly correlated with its average excess return as illustrated in Figure 1 below. For instance, the center panel says that stocks with low book-to-market ratios (i.e., the 5 portfolios at the bottom linked with an orange line) have too high a \beta_{\mathrm{Mkt},n} on the market when considering their paltry realized excess returns. For some reason, it doesn’t take much to get traders to hold growth stocks.

Click to embiggen.

FIGURE 1. Left Panel: Average excess returns vs. the market beta for 25 portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Center Panel: Same 25 data points connected by book-to-market ratio with \mathrm{BM}_{\mathrm{Low}} denoting the 5 portfolios in the lowest book-to-market quintile. Right Panel: Same 25 data points connected by size with \mathrm{S}_{\mathrm{Low}} denoting the 5 portfolios in the lowest size quintile. Plots correspond to Figures 20.9, 20.10, and 20.11 in Cochrane (2001).

This post reviews the analysis in Daniel and Titman (1997) which asks the natural follow up question: Why? The original explanation proposed in Fama and French (1993) was that these additional excess returns earned by small firms with high book-to-market ratios were due to exposures to latent risk factors. e.g., a stock with a high book-to-market ratio will tend to do poorly when the entire economy suffers from a financial crisis and precisely when you need cash the most. As a result, you are willing to pay less in order to hold this risk. However, Daniel and Titman (1997) suggest an alternative explanation: some omitted variable both causes value stocks to earn higher excess returns (i.e., have a high \alpha_{n,t}) and comove with one another (i.e., have a high \beta_{\mathrm{HML},n}).

Daniel and Titman (1998) highlight a nice parallel between the causal inference problem outlined above, and the inference problem facing an econometrician when trying to figure out the causal effect of going to college on a student’s future earnings. We all know that people with college degrees earn more over their lifetime than people without college degrees (e.g., see Card (1999)). Just as above, the main question is: Why? On one hand, it could be that the process of getting a degree raises your earning power (analogous to the “factor model”). However, it could also be that IQ really drives everyone’s lifetime earnings and on average people with higher IQs are more likely to get college degrees (analogous to the “characteristics model”). In this situation, finding that college graduates earn more than non-graduates says nothing about the relative value of person n‘s IQ or her degree in determining her salary:

(1)   \begin{align*} \mathrm{salary}_n &= \mu + \lambda_{\mathrm{GRAD}} \cdot 1_{\{\mathrm{GRAD}_n = 1\}} + \xi_n, \quad \lambda_{\mathrm{GRAD}} > 0 \end{align*}

Similarly, finding that stocks with high book-to-market ratios realize higher excess returns says nothing about where these excess returns are coming from. The only real conceptual difference between the two inference problems is in the case of graduation vs. IQ, the inputs to the regression are data; by contrast, in the case of factors vs characteristics, the inputs to the regression are estimated coefficients:

(2)   \begin{align*} \alpha_n &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_n, \quad \lambda_{\mathrm{HML}} > 0 \end{align*}

where \alpha_n is the monthly abnormal return to holding stock/portfolio n and \beta_{\mathrm{HML},n} is stock n‘s loading on the high-minus-low book-to-market factor from Fama and French (1993).

I begin in Section 2 by describing Fama and French (1993)‘s interpretation of the size and value premia. Then, in Section 3, I outline the alternative interpretation of these effects given by Daniel and Titman (1997). The authors propose a test to determine if some of the effect of the size and value premia flow through a channel other than the factor loadings. In Section 4, I describe this test and replicate their empirical analysis suggesting that there is indeed a component to the size and value premia that cannot be explained by factor loadings. Finally, in Section 5, I conclude with a short discussion of Daniel and Titman (1997)‘s results. All of the code used to create the figures in this post can be found on GitHub.

2. Distress Factor Loading

This section describes Fama and French (1993)‘s interpretation of the value premia—i.e., the higher excess returns earned by stocks with a high book-to-market ratio. A stock with a high book-to-market ratio has lots of tangible assets on its books in accounting terms (i.e., a high book value); however, the market does not value the equity in this company very highly (i.e., a low market capitalization). These stocks are in financial distress. Define \tilde{r}_{n,t+1} as the abnormal return to stock n after accounting for its comovement with the market return:

(3)   \begin{align*} \tilde{r}_{n,t+1} &= \left( r_{n,t+1} - r_{f,t+1} \right) - \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t+1} \end{align*}

The Figure 2 below shows that firms with high book-to-market ratios have really high returns and firms with low book-to-market ratios have really low returns on average.

Click to embiggen.

FIGURE 2. Monthly excess returns of 25 portfolios sorted on size and book-to-market ratio using data from July 1963 to December 1993. e.g., the time series in the \mathrm{BM}_{\mathrm{Low}} \times \mathrm{S}_{\mathrm{High}} panel in the lower left-hand corner corresponds to the monthly excess returns over the 30-day T-bill rate of a value weighted portfolio of stocks in the lowest book-to-market ratio quintile and the highest size quintile. The \mu value reported in the lower right-hand corner of each panel represents the mean excess return over the sample period and corresponds to the values reported in Table 1(a) from Daniel and Titman (1997). The height of the shaded red region in each panel is \mu which makes it easier to see how the mean excess returns vary across the 25 portfolios.

If a financial crisis comes along it will hit all of the firms already in financial distress the hardest. Fama and French (1993) point out that the outsized excess returns earned by high book-to-market stocks is consistent with the idea that traders don’t want to find out that their stocks have become worthless in the middle of a financial crisis. Thus, in order to hold these stocks, they must be rewarded with higher average excess returns. If this story is true, then these higher average excess returns will result from a larger \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] term in the intercept to the regression equation:

(4)   \begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \\ &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}} + \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}]  \right) + \varepsilon_{n,t+1} \end{align*}

One way to test this hypothesis would be to create a group of N test assets, run N versions of the time series regression specified in Equation (4) above to collect the \alpha_{n,t} and \beta_{\mathrm{HML},n} coefficients, and test to see if a nice linear relationship holds between the realized excess returns and each stock/portfolio’s loading on the HML factor:

(5)   \begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_{n,t} \end{align*}

Figure 3 below shows that causal diagram assumed in Fama and French (1993) linking the each stock’s average excess returns, \alpha_{n,t}, to its loading on the HML factor, \beta_{\mathrm{HML},n}. Figure 4 then shows that controlling for exposure to size and book-to-market ratio explains away much of the residual variation in the excess returns of the 25 test assets in Fama and French (1993) that isn’t explained by their comovement with the market.

Click to embiggen.

FIGURE 3. Causal diagram linking the coefficients \alpha_{n,t} and \beta_{\mathrm{HML},n} assumed in Fama and French (1993).

Click to embiggen.

FIGURE 4. Average excess return vs. excess return predicted by the Fama and French (1993) 3-factor model computed for 25 portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Left Panel: Data points connected by book-to-market ratio with \mathrm{BM}_{\mathrm{Low}} denoting the 5 portfolios in the lowest book-to-market quintile. Right Panel: Data points connected by size with \mathrm{S}_{\mathrm{Low}} denoting the 5 portfolios in the lowest size quintile. Plots correspond to Figures 20.12 and 20.13 in Cochrane (2001).

3. Characteristics-Based Pricing

In this section I describe Daniel and Titman (1997)‘s alternative interpretation of the value premium. These authors start with a similar first stage regression model:

(6)   \begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}|D_n] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \end{align*}

but replace the unconditional expectation \mathrm{E}_t[\tilde{r}_{n,t+1}] with the conditional expectation \mathrm{E}_t[\tilde{r}_{n,t+1}|D_n]. i.e., they propose that there is an omitted variable related to the fundamental “distressed-ness” of each firm n. Under this hypothesis, as a firm gets more and more financially distressed, its average excess returns must rise by an amount \lambda_D in order to induce traders to hold the stock. Thus, the time series regression in Equation (4) becomes:

(7)   \begin{align*} \tilde{r}_{n,t+1} &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - D_n \cdot \lambda_D - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}}  \\ &\qquad \qquad + \ \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}]  \right) + \varepsilon_{n,t+1} \end{align*}

with the second stage regression:

(8)   \begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \lambda_D \cdot D_n + \xi_{n,t} \end{align*}

Figure 5 below shows the causal diagram assumed in Daniel and Titman (1997) linking the each stock’s average excess returns, \alpha_{n,t}, to its loading on the HML factor, \beta_{\mathrm{HML},n}, and its distressed-ness, D_n. The dotted line linking \beta_{\mathrm{HML},n} and D_n captures the idea that distressed firms are likely to have larger loadings on the \mathrm{HML} factor in the same way that people with higher IQs are more likely to go to college.

Click to embiggen.

FIGURE 5. Causal diagram linking the coefficients \alpha_{n,t} and \beta_{\mathrm{HML},n} assumed in Daniel and Titman (1997).

The natural way to break this logjam and determine whether the value premium is due to a factor loadings or characteristic-based explanation would be to use an instrument. e.g., find some variable that is correlated with each firm’s factor loading, \beta_{\mathrm{HML},n}, but uncorrelated with its distress status, D_n; or, find some variable that is correlated with each firm’s distress status but uncorrelated with its factor loading. Similarly, to solve the graduation vs. IQ debate from the introduction, you would need either an instrument that randomly assigns people with the same IQ to college and non-college groups or an instrument that randomly shocks people’s IQs once they have made their college decision one way or another.

Daniel and Titman (1997) instrument for each firm’s level of distress, D_n. Note that the analogy to the instrumental variables approach here is imprecise since we can’t actually observe each firm’s level of distress directly. e.g., it would be impossible to predict the variable D_n in a regression. Within each size and book-to-market bucket, Daniel and Titman (1997) use a firm’s exposure to the \mathrm{HML} factor prior to the portfolio formation period as the instrument:

(9)   \begin{align*} Z_n &= \{ z_L, z_2, z_3, z_4, z_H\} \end{align*}

The logic behind this instrument is the following: If characteristics drive expected returns, there should be firms with characteristics that do not match their factor loadings. All the stocks in the same size and book-to-market deciles will have the same loading on the \mathrm{HML} factor. However, within each of the size and book-to-market buckets, there will be firms whose returns have been highly correlated with the \mathrm{HML} factor in the past as well as firms whose returns have been weakly correlated with the \mathrm{HML} factor in the past. Daniel and Titman (1997) think about this within group historical variation as exogenous and use it to instrument for each firm’s true level of distress.

I use Z_n = z_H to denote the firms with the highest historical correlation with the \mathrm{HML} factor and Z_n = z_L to denote the firms with the lowest historical correlation. To empirically estimate whether or not more distressed firms earn higher average excess returns independent of their \mathrm{HML} factor loading, Daniel and Titman (1997) first sort stocks into size and book-to-market buckets to create a residual \tilde{\alpha}_{n,t} that captures the excess returns not explain by firms’ factor loadings:

(10)   \begin{align*} \tilde{\alpha}_{n,t} &= \alpha_{n,t} - \left( \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} \right) \end{align*}

They then compute:

(11)   \begin{align*} \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_H] - \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_L] &= \lambda_D \cdot \left( \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \right) \\ &\qquad \qquad - \ \left( \mathrm{E}[\xi_{n,t} | Z_n = z_H] - \mathrm{E}[\xi_{n,t} | Z_n = z_L ] \right) \end{align*}

which captures the mean effect of being more distressed, \lambda_D, times the average level of additional distressed experienced by firms with a high historical correlation with the \mathrm{HML} factor:

(12)   \begin{align*} \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \end{align*}

4. Empirical Analysis

This section replicates the main empirical results in Daniel and Titman (1997). I calculate each stock’s book equity using COMPUSTAT data as the stock holder’s equity plus any deferred taxes and any investment tax credit, minus the value of any preferred stock. I calculate each stock’s market equity using CRSP data as the number of shares outstanding times its share price. To compute the book-to-market ratio in year t, I use the book equity value from any point in year (t - 1), and the market equity on the last trading day in year (t - 1). The market equity value used in forming the size portfolios is the last trading day of June of year t. I exclude firms that have been listed on COMPUSTAT for less than 2 years or have a book-to-market ratio of less than 0. I demand that firms have prices available on CRSP in both December of (t - 1) and June of year t. See Figure 6 below for a summary of the timing.

Click to embiggen.

FIGURE 6. Timing of the portfolio creation and holding periods associated with the size and book-to-market portfolios analyzed in Daniel and Titman (1997).

I use the size and book-to-market ratio data to create the Fama and French (1993) \mathrm{SMB} and \mathrm{HML} factors as follows. For the \mathrm{SMB} factor, big stocks (B) are above the median market equity of NYSE firms and small stocks (S) are below the median. For the \mathrm{HML} factor, low book-to-market ratio stocks (L) are below the 30th percentile of the book-to-market ratios of NYSE firms, medium book-to-market ratio stocks (M) are in the middle 40{\scriptstyle \%} percent, and high book-to-market ratio stocks H are in the top 30{\scriptstyle \%}. Using these buckets, I then form 6 value-weighted portfolios and then estimate the \mathrm{SMB} and \mathrm{HML} factors as the intersection of these portfolio returns:

(13)   \begin{align*} f_{\mathrm{HML},t} &= \left( \frac{r_{S,H,t} + r_{B,H,t}}{2} \right) - \left( \frac{r_{S,L,t} + r_{B,L,t}}{2} \right) \\ f_{\mathrm{SMB},t} &= \left( \frac{r_{S,H,t} + r_{S,M,t} + r_{S,L,t}}{3} \right) - \left( \frac{r_{B,H,t} + r_{B,M,t} + r_{B,L,t}}{3} \right) \end{align*}

To create the 25 size and book to market portfolio returns, I use cutoffs at 20{\scriptstyle \%}, 40{\scriptstyle \%}, 60{\scriptstyle \%}, and 80{\scriptstyle \%} for both the size and book-to-market ratio dimensions. To create the 9 size and book to market portfolio returns, I use cutoffs at 33{\scriptstyle \%} and 66{\scriptstyle \%} for both the size and book-to-market ratio dimensions.

To estimate a firm’s historical exposure to the \mathrm{HML} factor, I take all of the firms in each of the 9 size and book-to-market ratio buckets as of July each year t. For each of these firms, I then estimate the following time series regression from January of (t-3) to December of (t-1) for a total of 36 months:

(14)   \begin{align*} r_{n,t} &= \alpha_n + \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t} + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},n} \cdot f_{\mathrm{SMB},t} + \varepsilon_{n,t} \end{align*}

I harvest the regression coefficients and sort the stocks into 5 buckets based on the realized \beta_{\mathrm{HML},n} loadings to assign a value of Z_n to each firm using cutoffs at 20{\scriptstyle \%}, 40{\scriptstyle \%}, 60{\scriptstyle \%}, and 80{\scriptstyle \%}. Thus, a firm in the Z_n = z_H bucket in July 2005 had a \beta_{\mathrm{HML},n} loading from January 2002 to December 2004 that was among the highest 20{\scriptstyle \%} within its size and book-to-market grouping. I drop the 6 month period between July 2005 and December 2004 because it appears that the returns to stocks in the \mathrm{HML} portfolio behave abnormally over this sample period as illustrated in Figure 7 below.

Click to embiggen.

FIGURE 7. Pre-formation returns to stocks in the HML portfolio for formation dates during the period from July 1963 to July 1993. The thick black line represents the mean value, the vertical bars represent the 95{\scriptstyle \%} confidence bounds around this mean in each month, and the 2-digit numbers label the realized returns to stocks in the HML portfolio \tau months prior to portfolio formation in the year 19\mathrm{YY}. This figure corresponds to Figure 1 in Daniel and Titman (1997).

Now comes the punchline of the paper: a portfolio that is long firms in the high distress group, z_H, and short firms in the low distress group, z_L, within each of the 9 size and book-to-market buckets generates abnormal returns relative to the Fama and French (1993) 3 factor model. To see this, first take a look at Figure 8 below. Just as in Figure 2, it’s clear that a stock’s average excess returns rise as it becomes smaller and its book-to-market ratio gets larger. i.e., the average height of the numbers increases as you move northwest across the panels. However, Figure 8 also shows that, within each of the 9 size and book-to-market portfolios, firms with higher historical loadings on the \mathrm{HML} factor tend to earn higher excess returns. i.e., the average height of the numbers increases as you move from left to right within each of the panels. What’s more, moving to Figure 9 reveals that this effect is robust to the Fama and French (1993) 3 factor model. Figure 9 plots the coefficient estimates and standard errors to the 9 time series regressions:

(15)   \begin{align*} r_{z_H,t+1} - r_{z_L,t+1} &= \alpha + \beta_{\mathrm{Mkt},t+1} \cdot r_{\mathrm{Mkt},t+1} + \beta_{\mathrm{HML},t+1} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},t+1} \cdot f_{\mathrm{SMB},t+1} + \varepsilon_{t+1} \end{align*}

All of the estimated \alphas are positive except for 1, 2 are statistically significant at the 5{\scriptstyle \%} level, and 2 more are quite close to this threshold. By contrast, a purely factor model explanation would predict that all of these \alphas should be 0.

Click to embiggen.

FIGURE 8. Mean monthly excess returns of the 45 portfolios sorted on size, book-to-market, and pre-formation HML factor loading using data from July 1973 to December 1993. The blue numbers labelled “Actual” correspond to the values reported in Table 3 of Daniel and Titman (1997). The red numbers labelled “Estimated” correspond to the values that I calculated. e.g., this figure reads that I estimate the average, value-weighted, monthly excess return of stocks in the lowest size tercile, highest book-to-market tercile, and lowest pre-formation HML factor loading quintile to be 0.906{\scriptstyle \%/\mathrm{mo}} while the value reported in Table 3 of Daniel and Titman (1997) is 1.211{\scriptstyle \%/\mathrm{mo}}.

Click to embiggen.

FIGURE 9. Estimated coefficients and R^2s from the regression in Equation (15) estimated within each of the 9 size and book-to-market buckets. The dots represent the point estimates. The vertical lines represent the 95{\scriptstyle \%} confidence intervals. All statistically significant coefficients are flagged in red. e.g., this figure reads that within the group of stocks with the lowest book-to-market ratio and the highest market capitalization (e.g., the bottom left panel), firms with the highest historical loading on the \mathrm{HML} factor (i.e., the most distressed firms) had excess returns that were 0.87{\scriptstyle \%/\mathrm{mo}} higher than firms with the lowest historical loading on the \mathrm{HML} factor (i.e., the least distressed firms). The estimated values in this figure correspond to the values reported in Table 6 of Daniel and Titman (1997).

5. Discussion

Daniel and Titman (1997) is a really nice paper that makes a very simple and insightful point: factor loadings do not imply a causal relationship. They support this point by giving evidence that even after controlling for factor exposure, firm’s which are more distressed prior to portfolio formation (i.e., have a distress characteristic) earn higher returns. However, there is a big caveat that comes with the findings. Namely, any characteristics-based model of stock returns necessarily admits arbitrage. After all, a characteristics-based explanation for the value premium says that by choosing stocks with different characteristics, you can change your portfolio’s average return without adjusting its risk loadings. i.e., you can create an arbitrage opportunity. This fact makes it difficult to interpret the phrase “characteristics-based explanation.” As Arthur Eddington (1934) wrote, “it is a good rule not to put overmuch confidence in the observational results that are put forward until they have been confirmed by theory.”

Volatility Decomposition of a Typical Firm

1. Introduction

This post reviews the analysis in Campbell, Lettau, Malkiel, and Xu (2001) who find that firm level volatility has been rising over the period from July 1962 to December 1997. I’ve posted the code I used here. What does this mean? The authors look at the day-to-day variations in the stock returns of all publicly listed firms on the NYSE, Amex, and NASDAQ exchanges in each month, t, during this sample. They then show how to decompose the variance of the daily returns each month for a typical firm (e.g., a firm selected randomly with probability proportional to its market cap) into a market-specific component, an industry-specific component, and a firm specific component. Campbell, Lettau, Malkiel, and Xu (2001) find that this firm-specific variance has been steadily rising as plotted in the figure below.

Annualized firm-specific variance.

Annualized variance within each month of daily firm returns relative to the firm’s industry’s value weighted return for the period from July 1962 to December 1997.

I begin by detailing how Campbell, Lettau, Malkiel, and Xu (2001) estimate their market-specific, industry-specific, and firm-specific (i.e., idiosyncratic) volatility components. A natural first approach would be to estimate I industry-level regressions:

(1)   \begin{align*} r_{i,t} &= \beta_{i,m} \cdot r_{m,t} + \tilde{\epsilon}_{i,t} \end{align*}

and J firm-level regressions:

(2)   \begin{align*} \begin{split} r_{j,t} &= \beta_{j,i} \cdot r_{i,t} + \tilde{\eta}_{j,t} \\ &= \beta_{j,i} \cdot \beta_{i,m} \cdot r_{m,t} + \beta_{j,i} \cdot \tilde{\epsilon}_{i,t} + \tilde{\eta}_{j,t} \end{split} \end{align*}

where r_{m,t} denotes the value-weighted excess return on the market, r_{i,t} denotes the value-weighted excess return on industry i, and r_{j,t} denotes the excess return on stock i. You could then just insert the realized \beta_{j,i} and \beta_{i,m} terms into expressions for the variation in r_{i,t} and r_{j,t} to get the desired result, no?

(3)   \begin{align*} \begin{split} \mathrm{Var}[r_{i,t}] &= \beta_{i,m}^2 \cdot \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\tilde{\epsilon}_{i,t}] \\ \mathrm{Var}[r_{j,t}] &= \beta_{j,m}^2 \cdot \mathrm{Var}[r_{m,t}] + \beta_{j,i}^2 \cdot \mathrm{Var}[\tilde{\epsilon}_{i,t}] + \mathrm{Var}[\tilde{\eta}_{j,t}] \end{split} \end{align*}

The problem here is that \beta_{j,m} and \beta_{j,i} are hard to estimate and may well vary over time. Thus, a \beta-independent procedure is necessary. After describing this procedure in Section 2, I then replicate the variance time series used in Campbell, Lettau, Malkiel, and Xu (2001) in Section 3. Finally, in Section 4 I conclude by extending the sample period to December 2012 and discussing the interpretation of the results.

The analysis in Campbell, Lettau, Malkiel, and Xu (2001) ties in closely with numerous other findings in asset pricing, macroeconomics, and behavioral finance. In the empirical asset pricing literature, Ang, Hodrick, Xing, and Zhang (2006) find a puzzling result that firms with high idiosyncratic volatility “have abysmally low average returns.” e.g., stocks with the highest idiosyncratic volatility have -1.06{\scriptstyle \%/\mathrm{mo}} lower excess returns than stocks with the lowest idiosyncratic volatility. In a macroeconomic context, this analysis is directly supports the granular origins theory of Gabaix (2011) which proposes that the key source of aggregate macroeconomic fluctuations is idiosyncratic firm-specific shocks to large firms. Finally, in a behavioral finance setting, the fact that market and industry models explain so little of the variation in firm-level stock returns (somewhere between 20{\scriptstyle \%} and 30{\scriptstyle \%}) suggests a new kind of problem for traders: scarce attention. “What this information consumes is rather obvious,” writes Herbert Simon. “It consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” As highlighted in Chinco (2012), it takes time to sift through all of the competing information about each and every firm and subset of firms.

2. Statistical Model

In this section I explain how Campbell, Lettau, Malkiel, and Xu (2001) compute the market-wide, industry-level, and idiosyncratic contributions to the variation in a firm’s daily returns. The challenge to doing this is that estimating the firm-specific \betas empirically is noisy and unreliable. To get around this challenge, Campbell, Lettau, Malkiel, and Xu (2001) decompose the daily return variance of a “typical” firm rather than every firm. e.g., they think about the determinants of the daily return variance of a firm selected at random each month from the market with probability proportional to its relative market capitalization.

To see why looking at a typical firm might be helpful here, define the relative market capitalization of an entire industry, w_{i,t}, and the relative market capitalization of a particular firm, w_{j,t}, in month t as follows:

(4)   \begin{align*} w_{i,t} &= \frac{\sum_{j \in J[i]} \mathrm{MCAP}_{j,t}}{\sum_{j \in J} \mathrm{MCAP}_{j,t}} \qquad \text{and} \qquad w_{j,t} = \frac{\mathrm{MCAP}_{j,t}}{\sum_{j \in J} \mathrm{MCAP}_{j,t}} \end{align*}

where \sum_{i \in I} w_{i,t} = 1 and \sum_{j \in J} w_{j,t} = 1. I use the notation that j denotes a particular firm in the market, and use J[i] \subset J to denote the subset of firms in industry i: J[i] = \{ j \in J \mid \mathrm{Industry}(j) = i \}. The key insight is that the value weighted sums of the industry-level and firm-specific \betas have to sum to unity:

(5)   \begin{align*} 1 &= \sum_{i \in I} w_{i,t} \cdot \beta_{i,m} \qquad \text{and} \qquad 1 = \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \beta_{j,i} \end{align*}

This observation follows mechanically from the fact that the market return is just the value-weighted sum of its industry constituents and the industry returns are just the value-weighted sums of their firm constituents:

(6)   \begin{align*} r_{m,t} &= \sum_{i \in I} w_{i,t} \cdot r_{i,t} \qquad \text{and} \qquad r_{i,t} = \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot r_{j,t} \end{align*}

By sampling appropriately, you can eliminate the pesky \betas from Equation (3) by converting their weighted sums into 1s.

How would you go about decomposing the variance of a typical firm in practice? First, consider estimating the market-wide and industry-specific variance components in a \beta-independent fashion. Instead of running the regression in Equation (1), consider computing the difference between each industry’s value weighted return, r_{i,t}, and the value weighted return on the market, r_{m,t}:

(7)   \begin{align*} r_{i,t} &= r_{m,t} + \epsilon_{i,t} \end{align*}

where \epsilon_{i,t} now lacks the tilde and is connected to \tilde{\epsilon}_{i,t} via the relationship:

(8)   \begin{align*} \epsilon_{i,t} &= \tilde{\epsilon}_{i,t} + (\beta_{i,m} - 1) \cdot r_{m,t} \end{align*}

Since \epsilon_{i,t} is not a regression residual, it will not be orthogonal to the market return, \mathrm{Cov}[r_{m,t},\epsilon_{i,t}] \neq 0. Thus, when computing the the variance of the value weighted industry return, r_{i,t}, we get:

(9)   \begin{align*} \mathrm{Var}[r_{i,t}] &= \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] + 2 \cdot \mathrm{Cov}[r_{m,t},\epsilon_{i,t}] \\ &= \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] + 2 \cdot (\beta_{i,m} - 1) \cdot \mathrm{Var}[r_{m,t}] \end{align*}

Applying the sampling trick described above then allows us to remove the \betas by averaging over all industries:

(10)   \begin{align*} \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[r_{i,t}] &= \sum_{i \in I} w_{i,t} \cdot \left\{ (2 \cdot \beta_{i,m} - 1) \cdot \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] \right\} \\  &= \mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}] \end{align*}

This result says that if you select an industry i \in I with probability \mathrm{Pr}[i] = w_{i,t}, then the expected variance of this industry’s daily returns in month t will consist of a market component, \mathrm{Var}[r_{m,t}], and an industry-specific component, \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}]. When the market component is big:

(11)   \begin{align*}  \frac{\mathrm{Var}[r_{m,t}]}{\mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}]} \end{align*}

then most of the variation in value weighted industry returns tends to come from broad market-wide shocks. Conversely if the industry-specific component is relatively large, then most fo the variation in value weighted industry returns tends to come from different industry-specific shocks which are only felt in their particular corner of the market.

Next consider estimating the firm-specific variance component in a \beta-independent way using the same procedure. Instead of running the regression in Equation (2), I compute the difference between each firm’s excess returns and the value weighted excess returns on its industry:

(12)   \begin{align*} r_{j,t} &= r_{i,t} + \eta_{j,t} \\ \eta_{j,t} &= \tilde{\eta}_{j,t} + (\beta_{j,i} - 1) \cdot r_{i,t} \end{align*}

Since \eta_{j,t} is not a regression residual, it will no longer be orthogonal to the value weighted industry return, \mathrm{Cov}[r_{i,t},\eta_{i,t}] \neq 0, and thus:

(13)   \begin{align*} \mathrm{Var}[r_{j,t}] &= \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{j,t}] + 2 \cdot \mathrm{Cov}[r_{i,t},\eta_{i,t}] \\ &= \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{i,t}] + 2 \cdot (\beta_{j,i} - 1) \cdot \mathrm{Var}[r_{i,t}] \end{align*}

However, the same sampling trick means that the expression for the value weighted average variance over all stocks within each industry will be \beta-independent:

(14)   \begin{align*} \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \mathrm{Var}[r_{j,t}] &= \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \left\{ (2 \cdot \beta_{j,i} - 1) \cdot \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{j,t}] \right\} \\  &= \mathrm{Var}[r_{i,t}] + \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \mathrm{Var}[\eta_{j,t}] \end{align*}

The interpretation of this equation is similar to the interpretation of the market-to-industry decomposition above. Putting both of these pieces together then gives the full decomposition as follows:

(15)   \begin{align*} \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[r_{j,t}] &= \sum_{i \in I} \mathrm{Var}[r_{i,t}] + \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[\eta_{j,t}] \\ &= \mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}] + \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[\eta_{j,t}] \\ &= \sigma_{\mathrm{Mkt},t}^2 + \sigma_{\mathrm{Ind},t}^2 + \sigma_{\mathrm{Firm},t}^2  \end{align*}

3. Trends in Volatility

I now replicate these 3 variance measures from Campbell, Lettau, Malkiel, and Xu (2001) using daily and monthly CRSP data on NYSE, AMEX, and NASDAQ stocks over the sample period form July 1962 to December 1997. I restrict the data to include only common stocks with share prices above \mathdollar 1. The riskless rate corresponds to the 30 day T-Bill rate. First, to compute the empirical analogue of the market variance component in Equation (15), I compute the mean and variance of daily excess returns on the value-weighted market:

(16)   \begin{align*} \hat{\mu}_{\mathrm{Mkt},t} &= \frac{1}{S} \cdot \sum_{s = 1}^S r_{m,t-s} \\ \hat{\sigma}_{\mathrm{Mkt},t}^2 &= \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{m,t-s} - \hat{\mu}_{\mathrm{Mkt},t} \right)^2 \end{align*}

where S denotes the number of days in month t. I plot the annualized market component of the variance of daily firm returns in the figure below.

Annualized market-wide variance.

Annualized variance within each month of the daily value weighted market return in excess of the 30 day T-bill rate for the period July 1962 to December 1997.

Next, I compute the industry-specific contribution to the variance of a typical firm’s daily excess returns using the Fama and French (1997) industry classification codes. There are 48 industries in this classification system, and I code all stocks without a specified industry as their own group leading to 49 total industries. Clicking on the image below gives a plot of the number of firms in each of these industries.

Industry size distribution.

Click to embiggen. Number of firms in each industry from July 1962 to December 2012.

The industry-specific contribution to the variance in daily excess returns of a typical firm is then given by:

(17)   \begin{align*} \hat{\sigma}_{\mathrm{Ind},t}^2 &= \sum_{i \in I} w_{i,t} \cdot \left\{ \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{i,t-s} - r_{m,t-s} \right)^2 \right\} \end{align*}

I plot the resulting time series in the figure below.

Annualized industry-specific variance.

Annualized variance within each month of daily value weighted industry returns in excess of the value weighted market return for the period from July 1962 to December 1997.

Finally, I compute the idiosyncratic contribution to the variance of the daily excess returns of a typical firm as follows:

(18)   \begin{align*} \hat{\sigma}_{\mathrm{Firm},t}^2 &= \sum_{j \in J} w_{j,t} \cdot \left\{ \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{j,t-s} - r_{i,t-s} \right)^2 \right\} \end{align*}

This is the time series I plotted in the introduction. Empirically, it seems that the overwhelming majority of the daily variation in firm-level excess returns comes from idiosyncratic shocks. One ways to quantify this statement is to look at the “model” fit each month:

(19)   \begin{align*} 1 - \mathrm{Err}_t &= \frac{\hat{\sigma}_{\mathrm{Mkt},t}^2 + \hat{\sigma}_{\mathrm{Ind},t}^2}{\hat{\sigma}_{\mathrm{Mkt},t}^2 + \hat{\sigma}_{\mathrm{Ind},t}^2 + \hat{\sigma}_{\mathrm{Firm},t}^2} \end{align*}

where the model corresponds to a market model with industry factors. The (1 - \mathrm{Err}_t) term captures the fraction of the daily variation in firm-level excess returns that is explained by model and industry factors in each month and is consistently below 0.30. i.e., more than 70{\scriptstyle \%} of the daily variation is explained by firm-specific shocks!

Model fit.

Fraction of the daily variation in firm-level excess returns that is explained by value-weighted market and industry factors in each month from July 1962 to December 1997.

4. Discussion

There are a couple of interesting take away facts from this analysis. First, the nature of the 3 variance time series dramatically changes after December 1997 when the original sample period in Campbell, Lettau, Malkiel, and Xu (2001) ends. Specifically, the post 1997 time period is dominated by a pair of volatility spikes which are not firm-specific: the dot-com boom and the financial crisis. When compared to these events, the slow run up in the firm-specific variance component looks relatively minor.

Updated sample.

Top Panel: Annualized variance within each month of the daily value weighted market return in excess of the 30 day T-bill rate for the period July 1962 to December 1997. Middle Panel: Annualized variance within each month of daily value weighted industry returns in excess of the value weighted market return for the period from July 1962 to December 2012. Bottom Panel: Annualized variance within each month of daily firm returns relative to the firm’s industry’s value weighted return for the period from July 1962 to December 2012.

Nevertheless, even with these large macroeconomic shocks, the majority of the variation in daily firm-level excess returns is driven by firm-specific information. e.g., even during this later period the model fit, (1 - \mathrm{Err}_t), scarcely crosses the 40{\scriptstyle \%} threshold in spite of all of the systemic risk in the market! What’s more, there is substantially cyclicality in the model fit at roughly the 5{\scriptstyle \mathrm{yr}} horizon. i.e., once every 5{\scriptstyle \mathrm{yr}} the predictive power of the value weighted market and industry factors grows and then shrinks by roughly 10{\scriptstyle \%} or around 1/2-to-1/3 of its baseline. One way to interpret this finding is that there should be 5{\scriptstyle \mathrm{yr}} year cycles in the profitability of technical analysis using market-wide factors.

Updated sample.

Fraction of the daily variation in firm-level excess returns that is explained by value-weighted market and industry factors in each month from July 1962 to December 2012.

Effective Financial Theories

1. Introduction

One of the most astonishing things about financial markets is that there is interesting economics operating at so many different scales. Yet, no one would ever guess this fact by looking at standard asset pricing theory. To illustrate, take a look at the canonical Euler equation:

(1)   \begin{align*} p_{n,t} &= \mathrm{E}_t \left[ m_{t+1} \cdot \left(p_{n,t+1} + d_{n,t+1}\right) \right] \end{align*}

Here, p_{n,t} and d_{n,t} denote the ex-dividend price and dividend payout of the nth asset in the economy at time t, m_{t+1} denotes the prevailing stochastic discount factor, and \mathrm{E}_t(\cdot) denotes the conditional expectations operator given time t information. Equation (1) says that the price of the nth asset in the current period, t, is equal to the expected discounted value of the asset’s price and dividend payout in the following period, (t+1). At first glance this formulation seems perfectly sensible, but a closer look reveals two striking features:

  1. Time is dimensionless. i.e., Equation (1) is written in sequence time not wall clock time. Each period could equally well represent a millisecond, an hour, a year, a millenium, or anything in between. We usually think of the stochastic discount factor, m_{t+1}, as a function of traders’ utility from aggregate consumption. Thus, as Cochrane (2001) points out, if “stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch…. this seems silly.”
  2. The total number of stocks doesn’t show up anywhere in Equation (1). Not only do traders have to know when there is a profitable arbitrage opportunity somewhere out there in the market, they also have to find out exactly where this opportunity is and deploy the necessary funds and expertise to exploit it. Where’s Waldo? puzzles are hard for a reason. Identifying and trading into arbitrage opportunities is a fundamentally different activity when searching through 10000 rather than 10 predictors. More is different. This is the key insight highlighted in Chinco (2012).

In this post, I start by writing down a simple statistical model of returns in Section 2 which allows for shocks at different time horizons and across asset groupings of various sizes. Then, in Sections 3 and 4, I show how shocks at vastly different scales are difficult for traders to spot (…let alone act on). Such shocks can look like noise to “distant” traders in a mathematically precise sense. In Section 5, I conclude with a discussion of these observations. The key take away is that financial theories do not necessarily need to be globally applicable to make effective local predictions. e.g., a theory governing the optimal behavior of a high frequency trader may not have any testable predictions at the quarterly investment horizon where institutional investors operate.

2. Statistical Model

I start by writing down a statistical model of returns that allows for shocks at different time scales and across asset groupings of different sizes. e.g., Apple’s stock returns might be simultaneously affected by not only bid-ask bounce at the 100{\scriptstyle \mathrm{ms}} investment horizon but also momentum at the 1{\scriptstyle \mathrm{mo}} investment horizon. Alternatively, at the 1{\scriptstyle \mathrm{qtr}} Apple might realize both an earnings announcement shock as well as a national economic shock felt by all US firms.

Let \hbar denote the smallest investment horizon, so that all other time scales are indexed by an A_h = 1,2,3,\ldots:

(2)   \begin{align*}   h &= A_h \cdot \hbar \end{align*}

For concreteness, you might think about \hbar = (\mathrm{something}) \times 10^{-3}{\scriptstyle \mathrm{sec}} in modern asset markets. Thus, for a monthly investment horizon A_{\mathrm{month}} = (\mathrm{something}) \times 10^9 meaning that asset market investment horizons span somewhere between 9 and 11 orders of magnitude from high frequency traders to buy and hold value investors. This is a similar ratio to the ratio of the height of human to the diameter of the sun.

Click to Embiggen.

Click to Embiggen. Source: Delphix.

Let r_n(t,h) denote the log price change of the nth stock from time t through time (t + h):

(3)   \begin{align*} r_n(t,h) &= \log p_n(t+h) - \log p_n(t) = \sum_{q=1}^Q \delta_q(t,h) \cdot x_{n,q} + \epsilon_n(t,h) \end{align*}

where x_{n,q} \in \{0,1\} denotes whether or not stock n has attribute q, \delta_q(t,h) denotes the mean growth rate in the price of all stocks with attribute q from time t through time (t+h), and \epsilon_n(t,h) denotes idiosyncratic noise in stock n‘s percent return from time t through time (t+h). e.g., suppose that the mean growth rate of all technology stocks from January 1st, 1999 through the end of January 31st, 1999 was 120{\scriptstyle \%/\mathrm{yr}} or 10{\scriptstyle \%/\mathrm{mo}}. Then, I would write that:

(4)   \begin{align*}  \delta_{\mathrm{technology}}(\mathrm{Jan}1999,1{\scriptstyle \mathrm{mo}}) &= 0.10 \end{align*}

and Intel, Inc would realize a 10{\scriptstyle \%/\mathrm{mo}} boost in its January, 1999 returns since:

(5)   \begin{align*} x_{\mathrm{INTL},\mathrm{technology}} = 1 \end{align*}

The price shocks, \delta_q(t,h), take on the form:

(6)   \begin{align*} \delta_q(t,h) &= \sum_{a=0}^{A_h-1} \delta_q(t + a \cdot \hbar,\hbar) \quad \text{with} \quad \delta_q(t,\hbar) =  \begin{cases} s_q &\text{w/ prob} \quad \frac{1}{2} \cdot \left( 1 - e^{- f_q \cdot \hbar} \right) \\ 0 &\text{w/ prob} \quad e^{- f_q \cdot \hbar} \\ - s_q &\text{w/ prob} \quad \frac{1}{2} \cdot \left( 1 - e^{- f_q \cdot \hbar} \right) \end{cases} \end{align*}

The summation captures the idea that all shocks occur in a particular instant and then cumulate over time. e.g., there is a particular time interval, \hbar, during which a news release hits the wire or a market order flashes across the screen. Changes over time intervals longer than \hbar reflect the accumulation of changes across these tiny time intervals. The parameters s_q and f_q control the size and frequency of the qth shock. Each attribute’s size parameter has units of percent per \hbar, and the bigger the s_q the bigger the impact of the qth shock on the returns of all stocks with that attribute. Each attribute’s frequency parameter has units of shocks per \hbar, and the bigger the f_q the more often all stocks with attribute q realize a shock of size s_q. The idiosyncratic return noise is the summation of Gaussian shocks at each \hbar interval:

(7)   \begin{align*} \epsilon_n(t,h) &= \sum_{a=0}^{A_h-1} \epsilon_n(t + a \cdot \hbar,\hbar) \quad \text{with} \quad \epsilon_n(t,\hbar) \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}\left( 0, \sigma_u \cdot \sqrt{\hbar}\right) \end{align*}

3. Time Series

Very different financial theories can operate at vastly different time scales. e.g., attributes that are relevant at the millisecond time horizon will completely wash out by the monthly horizon and vice versa. In this section, I look at only the time series properties of one stock, so I suppress the n subscript and write Equation (3) as:

(8)   \begin{align*} r(t,h) &= \sum_{a = 0}^{A_h-1} r(t+a \cdot \hbar,\hbar) = \sum_{q=1}^Q \delta_q(t,h) \cdot x_q + \epsilon(t,h) \end{align*}

To see why, consider the problem of a value investor, Alice, operating at the monthly investment horizon. Suppose that she wants to know whether or not her arch nemesis Bill, a high frequency trader operating at the millisecond investment horizon, is actively trading in her asset. e.g., suppose that she is worried that Bill might have found some really clever new predictor that flits in and out of existence before she can take advantage of it. From Alice’s point of view, the random variable \delta_q(t,\hbar) has the unconditional distribution:

(9)   \begin{align*} \begin{split} \mathrm{E}\left[ \delta_q(t,\hbar) \right] &= 0 \\ \mathrm{E}\left[ \delta_q(t,\hbar)^2 \right] &= \left( 1 - e^{- f_q \cdot \hbar} \right) \cdot s_q^2 = \sigma_q^2 \\ \mathrm{E}\left[ \left| \delta_q(t,\hbar) \right|^3 \right] &= \left( 1 - e^{- f_q \cdot \hbar} \right) \cdot s_q^3 = \rho_q \end{split} \end{align*}

Let F_{A_h}(x) denote the cumulative distribution function of \delta_q(t,h)/(\sigma_q \cdot \sqrt{A_h}). e.g., F_{A_h}(x) governs the cumulative distribution of the average of the shocks that Bill sees over the length of each period from Alice’s perspective. Then, via the Berry-Esseen theorem we have that at the monthly investment horizon:

(10)   \begin{align*} \left| F_{A_h}(x) - \Phi(x) \right| &\leq \frac{0.7655 \cdot \rho_q}{\sigma_q^3 \cdot \sqrt{A_h}} = \frac{1}{\sqrt{A_h}} \cdot \left( \frac{0.7655}{\sqrt{1 - e^{- f_q \cdot \hbar}}} \right) = (\mathrm{something}) \times 10^{-5}  \end{align*}

Equation (10) says that the maximum vertical distance between the CDF of the monthly mean of the variable fluctuating at the \hbar time scale is identical to the normal distribution to within one part in one-hundred thousand.

Berry-Esseen CDF

Click to embiggen. This image shows the distance between the cumulative distribution functions of the standard normal distribution, \Phi(x), and the empirical distribution, F_{A_h}(x), as computed above.

There are a couple of ways to put this figure in perspective. First, note that trading strategies have to generate well above 0.5{\scriptstyle \%} abnormal returns per month in order to outpace trading costs. Second, note that Alice would need around 10^{10}{\scriptstyle \mathrm{mo}} of data to distinguish between a variable drawn from the standard normal distribution and F_{A_h}(x) at this level of granularity via the Kolmogorov–Smirnov test. Thus, Bill’s behavior at the \hbar investment horizon is effectively noise to Alice when looking only at monthly data. In order to figure out what Bill is doing, she has to stoop down to his investment horizon.

4. Cross Section

In the same way that different financial theories can operate at different time scales, different financial theories can also operate at vastly different levels of aggregation. On one hand, this statement is a bit obvious. After all, modern financial theory is built on the idea of risk minimization through portfolio diversification, and traders talk about strategies being “market neutral”. On the other hand, diversification is not the only force at work. Financial markets have many assets and traders use a vast number of predictors. What’s more, only a few of these predictors are useful at any point in time. As Warren Buffett says, “If you want to shoot rare, fast-moving elephants, you should always carry a loaded gun.” Pulling the trigger is easy. Finding the elephant is hard. Traders face a difficult search problem when trying to parse new shocks.

Suppose that Alice is a value investor specializing in oil and gas stocks and now wants to figure out where her other arch nemesis, Charlie, is trading in her market. Even if she knows that he is trading at roughly her investment horizon, it may still be hard for her to spot his price impact due to the vast number of possible strategies that he could be employing. In this section I study the 1{\scriptstyle \mathrm{mo}} returns of N stocks with Q=7 attributes:

(11)   \begin{align*} r_n &= \sum_{q=1}^7 \delta_q \cdot x_{n,q} + \epsilon_n \end{align*}

where I suppress all the time horizon arguments since I am concerned with the cross-section. For simplicity, suppose that Alice knows that Charlie is making a bet on only 1 of the 7 attributes so that:

(12)   \begin{align*} 1 &= \Vert {\boldsymbol \delta} \Vert_{\ell_0} = \sum_{q=1}^7 1_{\{\delta_q \neq 0\}} \end{align*}

where if \delta_q \neq 0, then \delta_q = s \gg \sigma_\epsilon for all q =1,2,\ldots,7. e.g., Alice is worried that Charlie’s spotted the one way of sorting all oil and gas stocks so that all the stocks with that attribute (e.g., operations in the Chilean Andes) have high returns and all of the stocks without the attribute have low returns. How many stocks does Alice have to follow in order for her to spot the sorting rule—i.e., the non-zero entry in ({\boldsymbol \delta})_{7 \times 1}?

It turns out that Alice only needs to examine 3 stocks so long as she gets to pick exactly which ones:

  1. Stock 1: Has attributes 1, 3, 5, 7
  2. Stock 2: Has attributes 2, 3, 6, 7
  3. Stock 3: Has attributes 4, 5, 6, 7

The fact that Alice can identify the correct attribute even though she has fewer observations than possible attributes, Q \gg N, is known as compressive sensing and was introduced by Candes and Tao (2005) and Donoho (2006). See Terry Tao’s blog post for an excellent introduction. For example, suppose that only the first stock had high returns of r_1 \approx s:

(13)   \begin{align*} \underbrace{\begin{bmatrix} s \\ 0 \\ 0 \end{bmatrix}}_{(\mathbf{r})_{3 \times 1}} &\approx \underbrace{\begin{bmatrix}  1 & 0 & 1 & 0 & 1 & 0 & 1  \\  0 & 1 & 1 & 0 & 0 & 1 & 1  \\  0 & 0 & 0 & 1 & 1 & 1 & 1  \end{bmatrix}}_{(\mathbf{X})_{3 \times 7}} \underbrace{\begin{bmatrix}  s \\ 0 \\ \vdots \\ 0  \end{bmatrix}}_{({\boldsymbol \delta})_{7 \times 1}} + \underbrace{\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix}}_{({\boldsymbol \epsilon})_{3 \times 1}} \end{align*}

then Alice can be sure that Charlie has been sorting using the first of the 7 stock attributes. The interesting part is Alice can’t identify Charlie’s strategy using any less than N = 3 stocks since:

(14)   \begin{align*} 7 = 2^3 - 1 \end{align*}

e.g., 3 stocks gives Alice just enough combinations to answer 7 yes or no questions.

What’s more, this result generalizes to the case where the data matrix, \mathbf{X}, is stochastic rather than deterministic. i.e., in real life Alice can’t decide how many oil and gas stocks with each attribute are traded each period in order to make it easiest to decipher Charlie’s trading strategy. Donoho and Tanner (2009) show that in a world where \mathbf{X} is a random matrix with Gaussian entries, x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1/N), there is a maximum number of predictors, K^*, above which it is impossible for Alice to spot K > K^* relevant attributes from among Q possibilities using only N stocks given by:

(15)   \begin{align*} N &= 2 \cdot K^* \cdot \log(Q/N) \cdot (1 + \mathrm{o}(1)) \end{align*}

and is summarized in the figure below replicated from Donoho and Stodden (2006). The x-axis runs from 0 to 1 and gives values for N/Q summarizing the relative amount of data available to Alice. The y-axis also runs from 0 to 1 and gives values for K/N summarizing the level of sparsity in the model. The underlying model is:

(16)   \begin{align*} r_n = \mathbf{x}_n {\boldsymbol \delta} + \epsilon_n \end{align*}

where \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0, 1/20), {\boldsymbol \delta} is zero everywhere except for K entries which are 1, and each x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1/\sqrt{Q}) with columns normalized to unit length. The forward stepwise regression procedure enters variables into the model in a sequential fashion, according to greatest t-statistic value. The procedure iteratively takes the single regressor with the highest t-statistic until reaching the \sqrt{2 \cdot \log Q} threshold (i.e., the Bonferroni threshold) which is roughly 3.25 when Q = 200. The K^* threshold given by Donoho and Tanner (2009) then corresponds to the white diagonal line cutting through the phase space above which linear regression procedure fails and below which it succeeds.

Click to embiggen.

Click to embiggen. This figure shows the average prediction error \Vert {\boldsymbol \delta} - \hat{\boldsymbol \delta} \Vert_{\ell_2}^2/\Vert {\boldsymbol \delta} \Vert_{\ell_2}^2 from the forward stepwise regression procedure described above.

The interesting part about this result is that this bound on K^* comes from a deep theorem in high-dimensional geometry which relates both compressive sensing and error correcting code as suggested by the deterministic example above. It is not due to any knitty gritty details of Alice’s search problem. Notice how the original bound in the Q=7 and N=3 example has an information theoretic interpretation! Thus, Charlie can hide behind the sheer number of possible explanations in the cross section in the same way that Bill can hide behind the sheer number of observations in the time series.

5. Discussion

The speed at which traders interact has greatly increased over the past decade. e.g., Spread Networks invested approximately \mathdollar 300{\scriptstyle \mathrm{mil}} in a new fiber optic cable linking New York and Chicago via the straightest possible route saving about 100 miles and shaving 6{\scriptstyle ms} off their delay. Table 5 in Pagnotta and Philippon (2012) documents the many investments in speed made by exchanges around the world. What’s more, trading behavior at this time scale seems to be decoupled from asset fundamentals. i.e., it’s unlikely that a stock’s value truly follows any of the patterns found in one of Nanex’s crop-circle-of-the-day plots. Motivated by events such as the flash crash there has been a great deal of discussion in recent years about the impact of high frequency trading on asset prices and welfare.

However, the rough calculations above suggest that traders with a monthly investment horizon might not even care about second-to-second fluctuations in asset prices. e.g., think of how high and low frequency bands of the same radio wave can carry rock and classical music to your FM radio receiver without interfering with one another. High frequency trading may be revealing nothing about the fundamental value of the companies in the market place, but just because these traders make short-run returns behave strangely doesn’t mean that they will ruin the market for institutional investors trading at a longer horizon. In this light, perhaps the canonical Euler equation needs to have some additional input parameters, N, Q and h:

(17)   \begin{align*} p_n(t) &= \mathrm{E}_t\left[ m_{N,Q}(t,h) \cdot \left\{ p_n(t+h) + d_n(t+h) \right\} \right] \end{align*}

which define the range over which the theory is effective?

Page 1 of 1212345»10...Last »