Intra-Industry Lead-Lag Effect

1. Introduction

Hou (2007) documents a really interesting phenomenon in asset markets. Namely, if the largest securities in an industry as measured by market capitalization perform really well in the current week, then the smallest securities in that industry tend to do well in the subsequent $2$ weeks. However, the reverse relationship does not hold. i.e., if the smallest securities in an industry do well in the current week, this tells you next to nothing about how the largest securities in that industry will do in the subsequent weeks. This effect is has a characteristic time scale of $1$ to $2$ weeks, and varies substantially across industries.

In this post, I replicate the main finding, provide some robustness checks, and then relate the result to the analysis in my paper Feature Selection Risk (2014).

2. Data Description

I use monthly and daily CRSP data from June 1963 to December 2001 to recreate Hou (2007), Table 1. I also replicate the same results using a different industry classification system over the period from January 2000 to December 2013. I look at securities traded on the NYSE, AMEX, and NASDAQ stock exchanges. I restrict the sample to include only securities with share codes $10$ or $11$ . i.e., I exclude things like ADRs, closed-end funds, and REITS. I calculate weekly returns by compounding daily returns between adjacent Wednesdays:

(1) $\begin{align*} \tilde{r}_{n,t} &= (1 + \tilde{r}_{n,\text{W}_{t-1}}) \cdot (1 + \tilde{r}_{n,\text{Th}_{t-1}}) \cdot (1 + \tilde{r}_{n,\text{F}_{t-1}}) \cdot (1 + \tilde{r}_{n,\text{M}_t}) \cdot (1 + \tilde{r}_{n,\text{Tu}_t}) \end{align*}$

I classify firms into industries in $2$ different ways. In order to replicate Hou (2007), Table 1 I use the $12$ industry classification system from Ken French’s website. This classification system is nice in the sense that it uses the SIC codes and can thus be extended back to the 1920s. However, the industry classification system that everyone in the financial industry uses is GICS codes. As a result, I also assign each firm to $1$ of $24$ different GICS industry subgroups.

I assign each firm to either an SIC or GICS industry based on their reported code in the monthly CRSP data as of the previous June. e.g., if I was looking at Apple, Inc in September 2005, then I would assign Apple to its industry as of June 2005; whereas, if I was looking at Apple in May 2005, then I would assign Apple to its industry as of June 2004. I use $N_{i,y}$ to denote the number of securities in industry $i$ in year $y$ . In each of the figure below, I report the average number of firms in each industry on an annual basis over the sample period:

(2) $\begin{align*} \langle N_{i,y} \rangle &= \left\lfloor \frac{1}{Y} \cdot \sum_{y=1}^Y N_{i,y} \right\rfloor \end{align*}$

e.g., when replicating the results in Hou (2007), Table 1 I compute the average number of firms using $Y = 64$ June observations.

Each June I also sort the securities in each industry $i$ by their market cap. After sorting, I then construct an equally weighted portfolio of the largest $30{\scriptstyle \%}$ of stocks in each industry and the smallest $30{\scriptstyle \%}$ of stocks in each industry:

(3) $\begin{align*} \tilde{r}_{i,t}^B &= \frac{1}{N_{i,y_t}^{30\%}} \cdot \sum_{n=1}^{N_{i,y_t}^{30\%}} \tilde{r}_{n,t} \qquad \text{and} \qquad \tilde{r}_{i,t}^S = \frac{1}{N - N_{i,y_t}^{70\%}} \cdot \sum_{n=N_{i,y_t}^{70\%} + 1}^{N_i} \tilde{r}_{n,t} \end{align*}$

In the analysis below, I look at the relationship of the weekly returns of these $2$ portfolios over the subsequent year. Note that these are within industry sorts. e.g., stocks in the “big” portfolio of the consumer durables industry might be in the “small” portfolio of the telecommunications industry.

Here is the code I use to pull the data from WRDS and create the figure in Section 3 below: gist.

3. Hou (2007), Table 1

Table 1 in Hou (2007) reports the cross-autocorrelation of the big and small intra-industry portfolios defined. To estimate these statistics, I first normalize the big and small portfolio weekly returns so that they each have a mean of $0$ and a standard deviation of $1$ :

(4) $\begin{align*} \mu_B &= \mathrm{E}[\tilde{r}_{i,t}^B] \qquad \text{and} \qquad \sigma_B = \mathrm{StDev}[\tilde{r}_{i,t}^B] \\ \mu_S &= \mathrm{E}[\tilde{r}_{i,t}^S] \qquad \text{and} \qquad \sigma_S = \mathrm{StDev}[\tilde{r}_{i,t}^S] \\ r_{i,t}^B &= \frac{\tilde{r}_{i,t}^B - \mu_B}{\sigma_B} \qquad \text{and} \qquad r_{i,t}^S = \frac{\tilde{r}_{i,t}^S - \mu_S}{\sigma_S} \end{align*}$

Then, to estimate the correlation between the returns of the big portfolio in week $t$ and the subsequent returns of the small portfolio in week $(t + l)$ I run the regression:

(5) $\begin{align*} r_{i,t+l}^S &= \beta(l) \cdot r_{i,t}^B + \epsilon_{i,t+l} \qquad \text{for } l = 0,1,2,\ldots,6 \end{align*}$

and estimate $\beta(l) = \mathrm{Cor}[r_{i,t}^B,r_{i,t+l}^S]$ . Similarly, to estimate the correlation between the returns of the small portfolio in week $t$ and the subsequent returns of the big portfolio in week $(t + l)$ I run the regression:

(6) $\begin{align*} r_{i,t+l}^B &= \gamma(l) \cdot r_{i,t}^S + \epsilon_{i,t+l} \qquad \text{for } l = 0,1,2,\ldots,6 \end{align*}$

to estimate $\gamma(l) = \mathrm{Cor}[r_{i,t}^S,r_{i,t+l}^B]$ . The advantage of this approach over estimating a simple correlation matrix is that you can read off the standard errors from the regression results rather than rely on asymptotic results.

The figure above gives the results of these regressions using data from January 1963 to December 2001. The solid blue and red lines give the point estimates for $\beta(l)$ and $\gamma(l)$ respectively at lags of $l=0,1,2,\ldots,6$ weeks. The shaded regions around the solid lines are the $95{\scriptstyle \%}$ confidence intervals around these point estimates. e.g., the panel in the upper left-hand corner reports that when the largest securities in the consumer non-durables industry realize a return that is $1$ standard deviation above mean in week $t$ the smallest securities in the consumer non-durables industry realize a return that is roughly $0.30$ standard deviations above their mean in week $(t+1)$ . By contrast, the smallest consumer non-durables securities have no predictive power over the future returns of their larger cousins.

4. Robustness Checks

The above results are quite interesting, but no one really uses the Ken French industry classification system when trading. The industry standard is GICS. The figure below replicates these same results over the period from January 2000 to December 2013 using the GICS codes. The results are similar, but slightly less pronounced. This replication suggests that shocks to the largest securities in an industry take roughly $2$ weeks to fully propagate out to the smallest securities in the same industry.

An obvious follow-up question is: “Is there something special about the largest firms in an industry? Or, is this cross-autocorrelation a statistical effect?” One way to shed light on this question is to look at the predictive power of the largest $10{\scriptstyle \%}$ of securities in each industry as opposed to the largest $30{\scriptstyle \%}$ :

(7) $\begin{align*} \tilde{r}_{i,t}^B &= \frac{1}{N_{i,t}^{10\%}} \cdot \sum_{n=1}^{N_{i,t}^{10\%}} \tilde{r}_{n,t} \qquad \text{and} \qquad \tilde{r}_{i,t}^S = \frac{1}{N - N_{i,t}^{70\%}} \cdot \sum_{n=N_{i,t}^{70\%} + 1}^{N_i} \tilde{r}_{n,t} \end{align*}$

If there is something fundamental about size, we should expect to see an even more pronounced disparity between the predictive power of the big and small portfolios. However, the figure below shows that looking at the predictive power of the really large firms (if anything) weakens the effect. It’s definitely not more pronounced.

5. Conclusion

What’s going on here? If size isn’t the root explanation, what is? In my paper Feature Selection Risk, I propose that the true culprit is not size, but rather the number of plausible shocks that might explain a firm’s returns. e.g., Apple might have really low stock returns in the current week for all sorts of reasons: bad product release, news about factory conditions, raw materials price shock, etc… Only some of these shocks will be relevant for other firms in the industry. It takes a while to parse Apple’s bad returns and figure out how you should extrapolate to other firms. By contrast, there are many fewer ways for a small firm’s returns to go very badly in the space of a few days, and often the reason is firm-specific.