Many Assets with Attribute-Specific Shocks

1. Motivation and Outline

Asset pricing models tend to focus on a single stock that realizes a normally distributed value shock of undefined origins. e.g., think of Kyle (1985) as a representative example. This is a great starting point; however, massive size and dense interconnectedness are key features of financial markets. Studying a financial market without these features is like studying dry water. In this post I suggest a simple way to modify the standard payout structure to allow for many assets and attribute-specific shocks.

What do I mean by attribute-specific shocks? To illustrate, have a look at the figure below which shows the most common 25{\scriptstyle \%} of topics that came into play when journalists from the Wall Street Journal wrote about Micron Technology from 2001 to 2012. The figure reads that: “If you select a Wall Street Journal article that mentioned Micron Technology in the abstract at random, then there is a 9{\scriptstyle \%} chance that ‘Antitrust’ is a listed subject.” Here’s the key point. When news about Micron Technology emerged, it was never just about Micron Technology. Journalists wrote about a particular SEC investigation, or a technology shock affecting all hard disk drive makers, or the firms currently active in the mergers and acquisitions market, etc… Value shocks are physical. They are rooted in particular events affecting subsets of stocks.

micron-search-subjects

A big market with attribute-specific shocks means perspective matters. Consider a real world example. e.g., Khandani and Lo (2007) wrote about the ‘Quant Meltdown’ of 2007 that “the most remarkable aspect of these hedge-fund losses was the fact that they were confined almost exclusively to funds using quantitative strategies. With laser-like precision, model-driven long/short equity funds were hit hard on Tue Aug 7th and Wed Aug 8th, despite relatively little movement in [the average level of] fixed-income and equity markets during those 2 days and no major losses reported in any other hedge-fund sectors.” Every individual stock was priced correctly, yet there was still a huge multi-stock price movement in a particular subset of stocks. Here’s the kicker: You would never have noticed this shock unless you knew exactly where to look!

2. Payout Structure

In Kyle (1985) there is a single stock with a fundamental value distributed as v \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_v^2). Suppose that, instead, there are actually N stocks that each have H different payout-relevant characteristics. Every characteristic can take on I distinct levels. I call a (characteristic, level) pairing an ‘attribute’ and use the indicator variable a_n(h,i) to denote whether or not a stock has an attribute. Think about attributes as sitting in a (H \times I)-dimensional matrix, \mathbf{A}, as illustrated in Equation (1) below:

(1)   \begin{equation*}   \mathbf{A}^{\top} = \bordermatrix{     ~      & 1                              & 2                         & \cdots & H                            \cr     1      & \text{Agriculture}             & \text{Albuquerque}        & \cdots & \text{Alcoa Inc}             \cr     2      & \text{Apparel}                 & \textbf{\color{red}Boise} & \cdots & \text{ConocoPhillips}        \cr     3      & \textbf{\color{red}Disk Drive} & \text{Chicago}            & \cdots & \text{Dell Inc} \cr     \vdots & \vdots                         & \vdots                    & \ddots & \vdots \cr     I      & \text{Wholesale}               & \text{Vancouver}          & \cdots & \textbf{\color{red}Xerox Corp} \cr} \end{equation*}

I’ve highlighted the attributes for Micron Technology. e.g., we have that a_{\text{Mcrn}}(\text{City},\text{Boise}) = 1 while a_{\text{WDig}}(\text{City},\text{Boise}) = 0 since Micron Technology is based in Boise, ID while Western Digital is based in SoCal.

Further, suppose that each stock’s value is then the sum of a collection of attribute-specific shocks:

(2)   \begin{align*} v_n &= \sum_{h,i} x(h,i) \cdot a_n(h,i) \end{align*}

where the shocks are distributed according to the rule:

(3)   \begin{align*} x(h,i) &= x^+(h,i) + x^-(h,i) \quad \text{with each} \quad x^\pm(h,i) \overset{\scriptscriptstyle \mathrm{iid}}{\sim}  \begin{cases} \pm \sfrac{\delta}{\sqrt{H}} &\text{ w/ prob } \pi \\ \ \: \, 0 &\text{ w/ prob } (1 - \pi) \end{cases} \end{align*}

Each of the x(h,i) indicates whether or not the attribute (h,i) happened to realize a shock. The \sfrac{\delta}{\sqrt{H}} > 0 term represents the amplitude of all shocks in units of dollars per share, and the \pi term represents the probability of either a positive or negative shock to attribute (h,i) each period.

You could also add the usual factor exposure and firm-specific shocks to the model:

(4)   \begin{align*} v_n &= \cancelto{0}{{\boldsymbol \theta}_n^{\top} \mathbf{f}} + \sum_{h,i} x(h,i) \cdot a_n(h,i) + \cancelto{0}{\epsilon_n} \end{align*}

I’ve excluded these terms for clarity since they are not new. You might be wondering: “Aren’t these attribute-specific shocks captured by a covariance matrix, though?” No. The covariance between any 2 assets in this setup is:

(5)   \begin{align*} \mathrm{Cov}\left[v_n,v_{n'}\right] = H \cdot \left(\sfrac{1}{I}\right)^2 \cdot 2 \cdot \pi \cdot (1-\pi) \cdot \left( \sfrac{\delta}{\sqrt{H}} \right)^2 = 2 \cdot \pi \cdot (1 - \pi) \cdot \left( \sfrac{\delta}{I} \right)^2 \end{align*}

where the first H corresponds to the number of characteristics, the (\sfrac{1}{I})^2 term denotes the probability that both stocks have the same level for a particular characteristic, the 2 \cdot \pi \cdot (1 - \pi) term denotes the probability that the attribute realizes a shock, and the (\sfrac{\delta}{\sqrt{H}})^2 term denotes the squared attributes-specific shock. The takeaway from this calculation is that the covariance matrix is completely flat (i.e., it doesn’t matter which n and n' you compare) and arbitrarily small.

plot-maximum-industry-specific-volatility

Lots of things that you might think of as explained by constant covariance aren’t. e.g., the figure above shows the maximum industry-specific contribution to daily return variance from January 1976 to December 2011 using the methodology in Campbell, Lettau, Malkiel, and Xu (2001). The vertical text at the bottom gives the name of the industry with the largest industry-specific contribution to daily return variance each month any time it changes from the previous month. The figure reads that: “While traders can usually expect to understand no more than 5{\scriptstyle \%/\mathrm{yr}} of a typical firm’s 20{\scriptstyle \%/\mathrm{yr}} variation in daily returns, there are times such as in 1987 when this 5{\scriptstyle \%/\mathrm{yr}} figure suddenly jumps to over 25{\scriptstyle \%/\mathrm{yr}}. What’s more, the density of the text along the base of the figure shows that the important (i.e., extremal) industry regularly changes from month to month.”

3. Approximation Error

One of the nice features of this reformulation of the usual normal value shocks is that, although it changes the interpretation of where each firm’s value comes from, it doesn’t alter any of the Guassian structure of the problem. i.e., the normal approximation to the binomial distribution says that:

(6)   \begin{align*} \sum_{h,i} x(h,i) \cdot a_n(h,i) \overset{\scriptscriptstyle \text{``ish''}}{\sim} \mathrm{N}(0, \sigma_v^2) \quad \text{where} \quad \sigma_v^2 = 2 \cdot \delta^2 \cdot \pi \cdot (1 - \pi) \end{align*}

where the “ish” means that there is a small and easy to compute approximation error. e.g., consider the collection of attribute-specific shocks for asset n, \{x_1,x_2,\ldots,x_H\}, with \mathrm{E}[x_h] = 0, \mathrm{E}[x_h^2] = \sigma_v^2 > 0, and \mathrm{E}[|x_h|^3] = \rho_v < \infty and define the normalized sum X(H) = \sfrac{1}{(\sigma_v \cdot \sqrt{H})} \cdot \sum_h x_h with the cumulative density function F_H(x) = \mathrm{Pr}[X(H) \leq x]. Then, we know via the central limit theorem that F_H(x) \to \Phi(x) as H \to \infty where \Phi(\cdot) is the standard normal distribution.

normal-approx-to-binom

Moreover, the Berry-Esseen Theorem says that:

(7)   \begin{align*} \max_{x \in \mathrm{R}}\left\{ \ \left| F_H(x) - \Phi(x) \right| \ \right\} &\leq \frac{0.50 \cdot \rho_v}{\sigma_v^3 \cdot \sqrt{H}} = \frac{0.50}{\sqrt{2 \cdot H \cdot \pi \cdot (1 - \pi)}} \end{align*}

where the second equals sign applies only in the special case of the sum of 2 binomially distributed random variables. The figure above shows how well this approximation holds as the number of payout-relevant characteristics, H, increases from 100 to 10000 in a world where \pi = \sfrac{1}{100}. I compute the x-axis on a grid of unit length \Delta x = \sfrac{1}{100}. If there are 100 firms with values that typically range over an area of \sigma_v^2 = \mathdollar 100{\scriptstyle /\mathrm{sh}}, then in a world with H = 1000 payout-relevant characteristics only 6 stocks will be misvalued by a mere \Delta x \cdot \sigma_v \cdot \sqrt{H} \approx \mathdollar 3{\scriptstyle /\mathrm{sh}} if you use the normal approximation to the binomial distribution rather than the true distribution. Thus, less than 1 dollar in 500 isn’t accounted for by the approximation:

(8)   \begin{align*}  0.0018 &= \frac{6{\scriptstyle \mathrm{stocks}} \cdot \mathdollar 3{\scriptstyle /\mathrm{sh}}}{100{\scriptstyle \mathrm{stocks}} \cdot \sigma_v^2}  \end{align*}

By contrast, the figure below shows the 12{\scriptstyle \mathrm{mo}} moving average of the percent of the variance in firm-level daily returns explained by market and industry factors over the time period from January 1976 to December 2011 using the methodology from Campbell, Lettau, Malkiel, and Xu (2001). This figure reads that: “For a randomly selected stock in 1999, market and industry considerations only account for around 30{\scriptstyle \%} of its daily return variation.” In other words, the usual factor models typically account for less than half of the fluctuations in firm value. i.e., they are 2 orders of magnitude less precise than the approximation error!

plot-market-and-industry-r2-series

4. Whose Perspective?

You might ask: “Why bother adding this extra structure?” In a big market with attribute-specific shocks, perspective matters. This is the punchline. Asset values and attribute-specific shocks essentially carry the same information since:

(9)   \begin{align*} v_n &= \sum_{h,i} x(h,i) \cdot a_n(h,i) + \mathrm{O}(H)^{-\sfrac{1}{2}} \\ x(h,i) &= \frac{1}{\sfrac{N}{I}} \cdot \sum_n v_n \cdot a_n(h,i) + \mathrm{O}(\sfrac{N}{I})^{-\sfrac{1}{2}} \end{align*}

However, knowing the value of an asset tells you very little about whether any particular one of its attributes has realized a shock. Similarly, knowing whether an attribute has realized a shock is a really noisy signal about the value of any particular stock with that attribute.

To see how this duality might affect asset prices, consider a simple example. e.g., suppose that we are in a multi-period Kyle (1985)-type world where value investors know the fundamental value of a particular stock, and they place orders with a market maker who processes only the order flow for that particular stock. It could well be the case that market makers price each stock correctly on average:

(10)   \begin{align*} \mathrm{E} \left[ \ p_{n,t} - v_n \ \middle| \ y_{n,t} \ \right] &= 0 \end{align*}

Yet, the high-dimensionality of market would mean that there still could be groups of mispriced stocks:

(11)   \begin{align*} \mathrm{E} \left[ \ \langle p_{n,t} \rangle_{h,i} - \langle v_n \rangle_{h,i} \ \middle| \ x(h,i) = \sfrac{\delta}{\sqrt{H}} \ \right] &< 0 \; \text{and} \; \mathrm{E} \left[ \ \langle p_{n,t} \rangle_{h,i} - \langle v_n \rangle_{h,i} \ \middle| \ x(h,i) = - \sfrac{\delta}{\sqrt{H}} \ \right] > 0 \end{align*}

where \langle p_{n,1} \rangle_{h,i} = \sfrac{I}{N} \cdot \sum_n p_{n,1} \cdot a_n(h,i) denotes the sample average time t price for stocks with a particular attribute, (h,i). This is a case of more is different. If an oracle told you that x(h,i) = \sfrac{\delta}{\sqrt{H}} for some attribute (h,i), then you would know that the average price of stocks with attribute (h,i) would be:

(12)   \begin{align*} \langle p_{n,t} \rangle_{h,i} &= \langle \lambda_{n,t} \cdot \beta_{n,t} \rangle_{h,i}  \cdot \frac{\delta}{\sqrt{H}} + \mathrm{O}(\sfrac{N}{I})^{-1/2} \end{align*}

where \langle \lambda_{n,t} \cdot \beta_{n,t}\rangle_{h,i}  < 1 since value investors would have an incentive to delay trading in a dynamic model. i.e., \langle p_{n,1} \rangle_{h,i} will be less than its fundamental value \langle v_n \rangle_{h,i} = \sfrac{\delta}{\sqrt{H}} even though it will be easy to see that \langle p_{n,1} \rangle_{h,i} \neq 0 as \sfrac{I}{N} \to 0.

There are way more payout-relevant attributes than anyone could ever investigate in a single period. This is why Charlie Munger explains that it’s his job “to find a few intelligent things to do, not to keep up with every damn thing in the world.” If we think about each stock as a location in a “spatial” domain and the attribute-specific shocks as particular points in a “frequency” domain, this result takes on the flavor of a generalized uncertainty principle. i.e., it’s really hard to simultaneously estimate the price of a portfolio at both very fine scales (i.e., containing a single asset) and very low frequencies (i.e., affecting every stock with an attribute).