The Secrets N Prices Keep

1. Introduction

Prices are signals about shocks to fundamentals. In a world where there are many stocks and lots of different kinds of shocks to fundamentals, traders are often more concerned with identifying exactly which shocks took place than the value of any particular asset. e.g., imagine you are a day trader. While you certainly care about changes in the fundamental value of Apple stock, you care much more about the size and location of the underlying shocks since you can profit from this information elsewhere. On one hand, if all firms based in California were hit with a positive shock, you might want to buy shares of Apple, Banana Republic, Costco, …, and Zero Skateboards stock. On the other hand, if all electronic equipment companies were hit with a positive shock, you might want to buy up Apple, Bose, Cisco Systems, …, and Zenith shares instead.

It turns out that there is a sharp phase change in traders’ ability to draw inferences about attribute-specific shocks from prices. i.e., when there have been fewer than N^\star transactions, you can’t tell exactly which shocks affected Apple’s fundamental value. Even if you knew that Apple had been hit by some shock, with fewer than N^\star observations you couldn’t tell whether it was a California-specific event or an electronic equipment-specific event. By contrast, when there have been more than N^\star transactions, you can figure out exactly which shocks have occurred. The additional (N - N^\star) transactions simply allow you to fine tune your beliefs about exactly how large the shocks were. The surprising result is that N^\star is a) independent of traders’ cognitive abilities and b) easily calculable via tools from the compressed sensing literature. See my earlier post for details.

This signal recovery bound is thus a novel new constraint on the amount of information that real world traders can extract from prices. Moreover, the bound gives a concrete meaning to the term “local knowledge”. e.g., shocks that haven’t yet manifested themselves in N^\star transactions are local in the sense that no one can spot them through prices. Anyone who knows of their existence must have found out via some other channel. To build intuition, this post gives 3 examples of this constraint in action.

2. Out-of-Town House Buyer

First I show where this signal recovery bound comes from. People spend lots of time looking for houses in different cites. e.g., see Trulia or my paper. Suppose you moved away from Chicago a year ago, and now you’re moving back and looking for a house. When studying at a list of recent sales prices, you find yourself a bit surprised. People must have changed their preferences for 1 of 7 different amenities: ^{(1)}a 2 car garage, ^{(2)}a 3rd bedroom, ^{(3)}a half-circle driveway, ^{(4)}granite countertops, ^{(5)}energy efficient appliances, ^{(6)}central A/C, or ^{(7)}a walk-in closet. Having the mystery amenity raises the sale price by \beta > 0 dollars. You would know how preferences had evolved if you had lived in Chicago the whole time; however, in the absence of this local knowledge, how many sales would you need to see in order to figure out which of the 7 amenities mattered?

The answer is 3. How did I come up with this number? For ease of explanation, let’s normalize expected house prices to \mathrm{E}_{t-1}[p_{n,t}] = 0. Suppose you found one house with amenities \{1,3,5,7\}, a second house with amenities \{2, 3, 6, 7\}, and a third house with amenities \{4, 5, 6,7\}. The combination of prices for these 3 houses would reveal exactly which amenity had been shocked. i.e., if only the first house’s price was higher than expected, p_{1,t} \approx \beta, then Chicagoans must have changed their preferences for having a 2 car garage:

(1)   \begin{equation*} {\small  \begin{bmatrix} p_{1,t} \\ p_{2,t} \\ p_{3,t} \end{bmatrix}  = \begin{bmatrix} \beta \\ 0 \\ 0 \end{bmatrix}  =  \begin{bmatrix}  1 & 0 & 1 & 0 & 1 & 0 & 1  \\  0 & 1 & 1 & 0 & 0 & 1 & 1  \\  0 & 0 & 0 & 1 & 1 & 1 & 1  \end{bmatrix} \begin{bmatrix}  \beta \\ 0 \\ \vdots \\ 0  \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix}  } \quad \text{with} \quad \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma^2), \, \beta \gg \sigma \end{equation*}

By contrast, if it was the case that p_{1,t} \approx \beta, p_{2,t} \approx \beta, and p_{3,t} \approx \beta, then you would know that people now value walk-in closets much more than they did a year ago.

Here is the key point. 3 sales is just enough information to answer 7 yes or no questions and rule out the possibility of no change:

(2)   \begin{align*}   7 = 2^3 - 1 \end{align*}

N = 4 sales simply narrows your error bars around the exact value of \beta. N = 2 sales only allows you to distinguish between subsets of amenities. e.g., seeing just the 1st and 2nd houses with unexpectedly high prices only tells you that people like either half-circle driveways or walk-in closets more. It doesn’t tell you which one. The problem changes character at N = N^\star(7,1) = 3. When you have seen fewer than N^\star = 3 sales, information about how preferences have changed is purely local knowledge. Prices can’t publicize this information. You must live and work in Chicago to learn it.

3. Industry Analyst’s Advantage

Next, I illustrate how this signal recovery bound acts like a cognitive constraint for would be arbitrageurs. Suppose you’re a petroleum industry analyst. Through long, hard, caffeine-fueled nights of research you’ve discovered that oil companies such as Schlumberger, Halliburton, and Baker Hughes who’ve invested in hydraulic fracturing (a.k.a., “fracking”) are due for a big unexpected payout. This is really valuable information affecting only a few of the major oil companies. Many companies haven’t really invested in this technology, and they won’t be affected by the shock. How aggressively should you trade Schlumberger, Halliburton, and Baker Hughes? On one hand, you want to build up a large position in these stocks to take advantage of the future price increases that you know are going to happen. On the other hand, you don’t want to allow news of this shock to spill out to the rest of the market.

In the canonical Grossman and Stiglitz (1980)-type setup, the reason that would be arbitrageurs can’t immediately infer your hard-earned information from prices is the existence of noise traders. They can’t be completely sure whether a sudden price movement is due to a) your informed trading or b) random noise trader demand. Here, I propose a new confound: the existence of many plausible shocks. e.g., suppose you start aggressively buying up shares of Schlumberger, Halliburton, and Baker Hughes stock. As an arbitrageur I see the resulting gradual price increases in these 3 stocks, and ask: “What should my next trade be?” Here’s where things get interesting. When there have been fewer than N^\star transactions in the petroleum industry, I can’t tell whether you are trading on a Houston, TX-specific shock or a fracking-specific shock since all 3 of these companies share both these attributes. I need to see at least N^\star observations in order to recognize the pattern you’re trading on.


The figure above gives a sense of the number of different kinds of shocks that affect the petroleum industry. It reads: “If you select a Wall Street Journal article on the petroleum industry over the period from 2011 to 2013 there is a 19{\scriptstyle \%} chance that ‘Oil sands’ is a listed descriptor and a 7{\scriptstyle \%} chance that ‘LNG’ (i.e., liquid natural gas) is a listed descriptor.” Thus, oil stock price changes might be due to Q \gg 1 different shocks:

(3)   \begin{align*} \hat{p}_{n,t} &= p_{n,t} - \mathrm{E}_{t-1}[p_{n,t}] = \sum_{q=1}^Q \beta_{q,t} \cdot x_{n,q} + \epsilon_{n,t} \qquad \text{with} \qquad \epsilon_{n,t} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma^2) \end{align*}

where x_{n,q} denotes stock n‘s exposure to the qth attribute. e.g., in this example x_{n,q} = 1 if the company invested in fracking (i.e., like Schlumberger, Halliburton, and Baker Hughes) and x_{n,q}=0 if the company didn’t. What’s more, very few of the Q possible attributes matter each month. e.g., the plot below reads: “Only around 10{\scriptstyle \%} of all the descriptors in the Wall Street Journal articles about the petroleum industry over the period from January 2011 to December 2013 are used each month.” Thus, only K of the possible Q attributes appear to realize shocks each period:

(4)   \begin{align*} K &= \Vert {\boldsymbol \beta} \Vert_{\ell_0} = \sum_{q=1}^Q 1_{\{\beta_q \neq 0\}} \qquad \text{with} \qquad K \ll Q \end{align*}

Note that this calculation includes terms like ‘Crude oil prices’ which occur in roughly half the articles, so the actual rate is likely much lower. Crude oil prices is just a synonym for the industry.


For simplicity, suppose that 10 attributes out of a possible 100 realized a shock in the previous period, and you discovered 1 of them. How long does your informational monopoly last? Using tools from Wainwright (2009) it’s easy to show that uninformed traders need at least:

(5)   \begin{align*} N^\star(100,10) \approx 10 \cdot \log(100 - 10) = 45  \end{align*}

observations to identify which 10 of the 100 possible payout-relevant attributes in the petroleum industry has realized a shock. If it takes you (…and other industry specialists like you) around 1 hour to materially increase your position, then you have roughly 5.6 = \sfrac{45}{8} days (i.e., around 1 trading week) to build up a position before the rest of the market catches on assuming an 8 hour trading day.

4. Asset Management Expertise

Finally, I show how there can be situations where you might not bother trying to learn from prices because there are too many plausible explanations to check out. In this world everyone specializes in acquiring local knowledge. Suppose you’re a wealthy investor, and I’m a broke asset manager with a trading strategy. I walk into your office, and I try to convince you to finance my strategy that has abnormal returns of r_t per month:

(6)   \begin{align*}   r_t &= \mu + \epsilon_t   \qquad \text{with} \qquad    \epsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_{\epsilon}^2) \end{align*}

where \sigma_{\epsilon}^2 = 1{\scriptstyle \%} per month to make the algebra neat. For simplicity, suppose that there is no debate \mu > 0. In return for running the trading strategy, I ask for fees amounting to a fraction f of the gross returns. Of course, I have to tell you a little bit about how the trading strategy works, so you can deduce that I’m taking on a position that is to some extent a currency carry trade and to some extent a short-volatility strategy. This narrows down the list a bit, but it still leaves a lot of possibilities. In the end, you know that I am using some combination of K = 2 out of Q = 100 possible strategies.

You have 2 options. On one hand, if you accept the terms of this offer and finance my strategy, you realize returns net of fees equal to:

(7)   \begin{align*}   (1 - f) \cdot \mu \cdot T + \sum_{t=1}^T \epsilon_t \end{align*}

This approach would net you an annualized Sharpe ratio of \text{SR}_{\text{mgr}} = \sqrt{12} \cdot (1 - f) \cdot \sfrac{\mu}{\sigma} e.g., if I asked for a fee of f = 20{\scriptstyle \%}, and my strategy yielded a return of 2{\scriptstyle \%} per month, then your annualized Sharpe ratio net of my fees would be \text{SR}_{\text{mgr}} = 0.55.

On the other hand, you could always refuse my offer and try to back out which strategies I was following using the information you gained from our meeting. i.e., you know that my strategy involves using some combination of K=2 factors out of a universe of Q = 100 possibilities:

(8)   \begin{align*}   \mu &= \sum_{q=1}^{100} \beta_q \cdot x_{q,t}   \qquad \text{with} \qquad    \Vert {\boldsymbol \beta} \Vert_{\ell_0} = 3 \end{align*}

In order to deduce which strategies I was using as quickly as possible, you’d have to trade random portfolio combinations of these 100 different factors for:

(9)   \begin{align*}   T^\star(100,2) \approx 2 \cdot \log(100 - 2) = 9.17 \, {\scriptstyle \mathrm{months}} \end{align*}

Your Sharpe ratio during this period would be \text{SR}_{\text{w/o mgr}|\text{pre}} = 0, and afterwards you would earn the same Sharpe ratio as before without having to pay any fees to me:

(10)   \begin{align*}   \text{SR}_{\text{w/o mgr}|\text{post}} &= \sqrt{12} \cdot \left( \frac{0.02}{0.10} \right) = 0.69 \end{align*}

However, if you have to show your investors reports every year, it may not be worth it for you to reverse engineer my trading strategy. Your average Sharpe ratio during this period would be:

(11)   \begin{align*}   \text{SR}_{\text{w/o mgr}} &= \sfrac{9.17}{12} \cdot 0 + \sfrac{(12 - 9.17)}{12} \cdot 0.69 = 0.16 \end{align*}

which is well below the Sharpe ratio on the market portfolio. Thus, you may just want to pay my fees. Even though you could in principle back out which strategies I was using, it would take too long. You’re investors would withdraw due to poor performance before you could capitalize on your newfound knowledge.

5. Discussion

To cement ideas, let’s think about what this result implies for a financial econometrician. We’ve known since the 1970s that there is a strong relationship between oil shocks and the rest of the economy. e.g., see Hamilton (1983), Lamont (1997), and Hamilton (2003). Imagine you’re now an econometrician, and you go back and pinpoint the exact house when each fracking news shock occurred over the last 40 years. Using this information, you then run an event study which finds that petroleum stocks affected by each news shock display a positive cumulative abnormal return over the course of the following week. Would this be evidence of a market inefficiency? Are traders still under-reacting to oil shocks? No. Ex post event studies assume that traders know exactly what is and what isn’t important in real time. Non-petroleum industry specialists who didn’t lose sleep researching hydraulic fracturing have to parse out which shocks are relevant only from prices. This takes time. In the interim, this knowledge is local.