How Quickly Can We Decipher Price Signals?

1. Introduction

There are many different attribute-specific shocks that might affect an asset’s fundamental value in any given period. e.g., the prices of all stocks held in model-driven long/short equity funds might suddenly plummet as happened in the Quant Meltdown of August 2007. Alternatively, new city parking regulations might raise the value of homes with a half circle driveway. Innovations in asset prices are signals containing 2 different kinds of information: a) which of these Q different shocks has taken place and b) how big each of them was.

It’s often a challenge for traders to answer question (a) in real time. e.g., Daniel (2009) notes that during the Quant Meltdown “markets appeared calm to non-quantitative investors… you could not tell that anything was happening without quant goggles.” This post asks the question: How many transactions do traders need to see in order to identify shocked attributes? The surprising result is that there is a well-defined and calculable answer to this question that is independent of traders’ cognitive abilities. Local knowledge is an unavoidable consequence of this location recovery bound.

2. Motivating Example

It’s easiest to see where this location recovery bound comes from via a short example. Suppose you moved away from Chicago a year ago, and now you’re moving back and looking for a house. When looking at a list of recent sales prices, you find yourself surprised. People must have changed their preferences for 1 of 7 different amenities: ^{(1)}a 2 car garage, ^{(2)}a 3rd bedroom, ^{(3)}a half-circle driveway, ^{(4)}granite countertops, ^{(5)}energy efficient appliances, ^{(6)}central A/C, or ^{(7)}a walk-in closet. Having the mystery amenity raises the sale price by \beta > 0 dollars. To be sure, you would know how preferences had evolved if you had lived in Chicago the whole time; however, in the absence of this local knowledge, how many sales would you need to see in order to figure out which of the 7 amenities mattered?

The answer is 3. Where does this number come from? For ease of explanation, let’s normalize the expected house prices to \mathrm{E}_{t-1}[p_{1,t}] = 0. Suppose you found one house with amenities \{1,3,5,7\}, a second house with amenities \{2, 3, 6, 7\}, and a third house with amenities \{4, 5, 6,7\}. The combination of prices for these 3 houses would reveal exactly which amenity had been shocked. i.e., if only the first house’s price was higher than expected, p_{1,t} \approx \beta, then Chicagoans must have changed their preferences for having a 2 car garage:

(1)   \begin{equation*} {\small  \begin{bmatrix} p_{1,t} \\ p_{2,t} \\ p_{3,t} \end{bmatrix}  = \begin{bmatrix} \beta \\ 0 \\ 0 \end{bmatrix}  =  \begin{bmatrix}  1 & 0 & 1 & 0 & 1 & 0 & 1  \\  0 & 1 & 1 & 0 & 0 & 1 & 1  \\  0 & 0 & 0 & 1 & 1 & 1 & 1  \end{bmatrix} \begin{bmatrix}  \beta \\ 0 \\ \vdots \\ 0  \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix}  } \quad \text{with} \quad \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma^2), \, \beta \gg \sigma \end{equation*}

By contrast, if it was the case that p_{1,t} \approx \beta, p_{2,t} \approx \beta, and p_{3,t} \approx \beta, then you would know that people now value walk-in closets much more than they did a year ago.

Here is the key point. 3 sales is just enough information to answer 7 yes or no questions and rule out the possibility of no change:

(2)   \begin{align*}   7 = 2^3 - 1 \end{align*}

N = 4 sales simply narrows your error bars around the exact value of \beta. N = 2 sales only allows you to distinguish between subsets of amenities. e.g., seeing just the 1st and 2nd houses with unexpectedly high prices only tells you that people like either half-circle driveways or walk-in closets more. It doesn’t tell you which one. The problem changes character at N = N^\star(7,1) = 3… i.e., the location recovery bound.

3. Main Results

This section formalizes the intuition from the example above. Think about innovations in the price of asset n as the sum of a meaningful signal, f_n, and some noise, \epsilon_n:

(3)   \begin{align*} p_{n,t} - \mathrm{E}_{t-1}[p_{n,t}] &= f_n + \epsilon_n = \sum_{q=1}^Q \beta_q \cdot x_{n,q} + \epsilon_n \quad \text{with} \quad \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma^2) \end{align*}

where the signal can be decomposed into Q different attribute-specific shocks. In Equation (3) above, \beta_q \neq 0 denotes a shock of size |\beta_q| to the qth attribute and x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sfrac{1}{N}) denotes the extent to which asset n displays the qth attribute. Each of the data columns is normalized so that \mathrm{E} \, \sum_{n=1}^N \mathrm{Var}[x_{n,q}] = 1.

In general, when there are more attributes than shocks, K < Q, picking out exactly which K attributes have realized a shock is a combinatorially hard problem as discussed in Natarajan (1995). However, suppose you had an oracle which could bypass this hurdle and tell you exactly which attributes had realized a shock:

(4)   \begin{align*} \Vert \mathbf{f} - \hat{\mathbf{f}}^{\text{Oracle}} \Vert_{\ell_2} &= \inf_{\{\hat{\boldsymbol \beta} : \#[\beta_q \neq 0] \leq K\}} \, \Vert \mathbf{f} - \mathbf{X}\hat{\boldsymbol \beta} \Vert_{\ell_2} \end{align*}

In this world, your mean squared prediction error, \mathrm{MSE} = \frac{1}{N} \cdot \Vert \mathbf{p} - \hat{\mathbf{p}} \Vert_{\ell_2}^2, is given by:

(5)   \begin{align*} \mathrm{MSE}^{\text{Oracle}} & = \min_{0 \leq K \leq Q} \, \left\{ \, \frac{1}{N^{\text{Oracle}}} \cdot \Vert \mathbf{p} - \hat{\mathbf{p}}^{\text{Oracle}} \Vert_{\ell_2}^2 \, \right\} = \min_{0 \leq K \leq Q} \, \left\{ \, \frac{1}{N^{\text{Oracle}}} \cdot \Vert \mathbf{f} - \hat{\mathbf{f}}^{\text{Oracle}} \Vert_{\ell_2}^2 + \sigma^2 \, \right\} \end{align*}

where N^{\text{Oracle}} = N^{\text{Oracle}}(Q,K) = K denotes the number of observations necessary for your oracle. e.g., if each \beta_q \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Bin}(\kappa), then \mathrm{MSE}^{\text{Oracle}} = \sigma^2 since there is only variation in the location of the shocks and not the size of the shocks.

It turns out that if each asset isn’t too redundant relative to the number of shocked attributes, then you can achieve a mean squared error that is within a log factor of the oracle’s mean squared error using many fewer observations than there are attributes, N \ll Q. e.g., suppose that you used a lasso estimator:

(6)   \begin{align*} \hat{\boldsymbol \beta}^{\text{Lasso}} &= \arg\min_{\hat{\boldsymbol \beta}} \, \left\{ \, \frac{1}{2} \cdot \Vert \mathbf{p} - \mathbf{X} \hat{\boldsymbol \beta} \Vert_{\ell_2}^2 + \lambda_{\ell_1} \cdot \sigma \cdot \Vert \hat{\boldsymbol \beta} \Vert_{\ell_1} \, \right\} \end{align*}

with \lambda_{\ell_1} = 2 \cdot \sqrt{2 \cdot \log Q}. Then, Candes and Davenport (2011) show that:

(7)   \begin{align*} \mathrm{MSE}^{\text{Lasso}} &\leq \gamma \cdot \inf_{0 \leq K \leq Q} \, \left\{ \, \frac{1}{K} \cdot \Vert \mathbf{f} - \hat{\mathbf{f}}^{\text{Oracle}} \Vert_{\ell_2}^2 + \log Q \cdot \sigma^2 \, \right\} \end{align*}

with probability 1 - 6 \cdot Q^{-2 \cdot \log 2} - Q^{-1} \cdot (2 \cdot \pi \cdot \log Q)^{-\sfrac{1}{2}} where \gamma > 0 is a small numerical constant. However, this paragraph is quite loose. i.e., what exactly does the condition that “each asset isn’t too redundant relative to the number of shocked attributes” mean? Exactly how many observations would you need to see if each asset’s attribute exposure is drawn as x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sfrac{1}{N})?

Here’s where things get really interesting. Wainwright (2009) shows that there is a sharp bound on the number of observations, N^\star = N^\star(Q,K), that you need to observe in order for \ell_1-type estimators like Lasso to succeed when attribute exposure is drawn iid Gaussian:

(8)   \begin{align*} N^\star(Q,K) &= \mathcal{O}\left[K \cdot \log(Q - K)\right] \end{align*}

with Q \to \infty, K \to \infty, and \sfrac{K}{Q} \to \kappa for some \kappa > 0. When traders observe N < N^\star(Q,K) observations picking out which attributes have realized a shock is an NP-hard problem; whereas, when they observe N \geq N^\star(Q,K) there exist efficient convex optimization algorithms that solve this problem. This result says how the N^\star = 3 location recovery bound from the motivating example generalizes to arbitrary numbers of attributes, Q, and shocks, K.

4. Just Identified

I conclude this post by discussing the non-sparse case. i.e., \boldsymbol \beta usually isn’t sparse in econometric textbooks á la Hayashi, Wooldridge, or Angrist and Pischke. When every one of the Q attributes matters, it’s easy to decide which attributes to pay attention to—i.e., all of them. In this situation the mean squared error for an oracle is the same as the mean squared error for mere mortals:

(9)   \begin{align*} \mathrm{MSE}^{\text{Oracle}} & = \frac{1}{Q} \cdot \Vert \mathbf{f} - \hat{\mathbf{f}}^{\text{Oracle}} \Vert_{\ell_2}^2 + \sigma^2 = \frac{1}{Q} \cdot \Vert \mathbf{f} - \hat{\mathbf{f}}^{\text{Mortal}} \Vert_{\ell_2}^2 + \sigma^2 = \mathrm{MSE}^{\text{Mortal}} \end{align*}

Does the location recovery bound disappears in this setting?

No. This is not the case. Indeed, the attribute selection bound corresponds to the usual N \geq Q requirement for identification. To see why, let’s return to the motivating example in Section 2, and consider the case where any of the 7 attributes could have realized a shock. This leaves us with 128 different shock combinations:

(10)   \begin{align*} 128 &= {7 \choose 0} + {7 \choose 1} + {7 \choose 2} + {7 \choose 3} + {7 \choose 4} + {7 \choose 5} + {7 \choose 6} + {7 \choose 7} \\ &= 1 + 7 + 21 + 35 + 35 + 21 + 7 + 1 \\ &= 2^7 \end{align*}

so that N^\star = 7 gives just enough differences to identify which combination of shocks was realized. More generally, we have that for any number of attributes, Q:

(11)   \begin{align*} 2^Q &= \sum_{k=0}^Q {Q \choose k} \end{align*}

This gives an interesting information theoretic interpretation to the meaning of “just identified” that has nothing to do with linear algebra or the invertibility of a matrix.