One of the most astonishing things about financial markets is that there is interesting economics operating at so many different scales. Yet, no one would ever guess this fact by looking at standard asset pricing theory. To illustrate, take a look at the canonical Euler equation:
Here, and denote the ex-dividend price and dividend payout of the th asset in the economy at time , denotes the prevailing stochastic discount factor, and denotes the conditional expectations operator given time information. Equation (1) says that the price of the th asset in the current period, , is equal to the expected discounted value of the asset’s price and dividend payout in the following period, . At first glance this formulation seems perfectly sensible, but a closer look reveals two striking features:
- Time is dimensionless. i.e., Equation (1) is written in sequence time not wall clock time. Each period could equally well represent a millisecond, an hour, a year, a millenium, or anything in between. We usually think of the stochastic discount factor, , as a function of traders’ utility from aggregate consumption. Thus, as Cochrane (2001) points out, if “stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch…. this seems silly.”
- The total number of stocks doesn’t show up anywhere in Equation (1). Not only do traders have to know when there is a profitable arbitrage opportunity somewhere out there in the market, they also have to find out exactly where this opportunity is and deploy the necessary funds and expertise to exploit it. Where’s Waldo? puzzles are hard for a reason. Identifying and trading into arbitrage opportunities is a fundamentally different activity when searching through rather than predictors. More is different. This is the key insight highlighted in Chinco (2012).
In this post, I start by writing down a simple statistical model of returns in Section 2 which allows for shocks at different time horizons and across asset groupings of various sizes. Then, in Sections 3 and 4, I show how shocks at vastly different scales are difficult for traders to spot (…let alone act on). Such shocks can look like noise to “distant” traders in a mathematically precise sense. In Section 5, I conclude with a discussion of these observations. The key take away is that financial theories do not necessarily need to be globally applicable to make effective local predictions. e.g., a theory governing the optimal behavior of a high frequency trader may not have any testable predictions at the quarterly investment horizon where institutional investors operate.
2. Statistical Model
I start by writing down a statistical model of returns that allows for shocks at different time scales and across asset groupings of different sizes. e.g., Apple’s stock returns might be simultaneously affected by not only bid-ask bounce at the investment horizon but also momentum at the investment horizon. Alternatively, at the Apple might realize both an earnings announcement shock as well as a national economic shock felt by all US firms.
Let denote the smallest investment horizon, so that all other time scales are indexed by an :
For concreteness, you might think about in modern asset markets. Thus, for a monthly investment horizon meaning that asset market investment horizons span somewhere between and orders of magnitude from high frequency traders to buy and hold value investors. This is a similar ratio to the ratio of the height of human to the diameter of the sun.
where denotes whether or not stock has attribute , denotes the mean growth rate in the price of all stocks with attribute from time through time , and denotes idiosyncratic noise in stock ‘s percent return from time through time . e.g., suppose that the mean growth rate of all technology stocks from January st, 1999 through the end of January st, 1999 was or . Then, I would write that:
and Intel, Inc would realize a boost in its January, 1999 returns since:
The price shocks, , take on the form:
The summation captures the idea that all shocks occur in a particular instant and then cumulate over time. e.g., there is a particular time interval, , during which a news release hits the wire or a market order flashes across the screen. Changes over time intervals longer than reflect the accumulation of changes across these tiny time intervals. The parameters and control the size and frequency of the th shock. Each attribute’s size parameter has units of percent per , and the bigger the the bigger the impact of the th shock on the returns of all stocks with that attribute. Each attribute’s frequency parameter has units of shocks per , and the bigger the the more often all stocks with attribute realize a shock of size . The idiosyncratic return noise is the summation of Gaussian shocks at each interval:
3. Time Series
Very different financial theories can operate at vastly different time scales. e.g., attributes that are relevant at the millisecond time horizon will completely wash out by the monthly horizon and vice versa. In this section, I look at only the time series properties of one stock, so I suppress the subscript and write Equation (3) as:
To see why, consider the problem of a value investor, Alice, operating at the monthly investment horizon. Suppose that she wants to know whether or not her arch nemesis Bill, a high frequency trader operating at the millisecond investment horizon, is actively trading in her asset. e.g., suppose that she is worried that Bill might have found some really clever new predictor that flits in and out of existence before she can take advantage of it. From Alice’s point of view, the random variable has the unconditional distribution:
Let denote the cumulative distribution function of . e.g., governs the cumulative distribution of the average of the shocks that Bill sees over the length of each period from Alice’s perspective. Then, via the Berry-Esseen theorem we have that at the monthly investment horizon:
Equation (10) says that the maximum vertical distance between the CDF of the monthly mean of the variable fluctuating at the time scale is identical to the normal distribution to within one part in one-hundred thousand.
There are a couple of ways to put this figure in perspective. First, note that trading strategies have to generate well above abnormal returns per month in order to outpace trading costs. Second, note that Alice would need around of data to distinguish between a variable drawn from the standard normal distribution and at this level of granularity via the Kolmogorov–Smirnov test. Thus, Bill’s behavior at the investment horizon is effectively noise to Alice when looking only at monthly data. In order to figure out what Bill is doing, she has to stoop down to his investment horizon.
4. Cross Section
In the same way that different financial theories can operate at different time scales, different financial theories can also operate at vastly different levels of aggregation. On one hand, this statement is a bit obvious. After all, modern financial theory is built on the idea of risk minimization through portfolio diversification, and traders talk about strategies being “market neutral”. On the other hand, diversification is not the only force at work. Financial markets have many assets and traders use a vast number of predictors. What’s more, only a few of these predictors are useful at any point in time. As Warren Buffett says, “If you want to shoot rare, fast-moving elephants, you should always carry a loaded gun.” Pulling the trigger is easy. Finding the elephant is hard. Traders face a difficult search problem when trying to parse new shocks.
Suppose that Alice is a value investor specializing in oil and gas stocks and now wants to figure out where her other arch nemesis, Charlie, is trading in her market. Even if she knows that he is trading at roughly her investment horizon, it may still be hard for her to spot his price impact due to the vast number of possible strategies that he could be employing. In this section I study the returns of stocks with attributes:
where I suppress all the time horizon arguments since I am concerned with the cross-section. For simplicity, suppose that Alice knows that Charlie is making a bet on only of the attributes so that:
where if , then for all . e.g., Alice is worried that Charlie’s spotted the one way of sorting all oil and gas stocks so that all the stocks with that attribute (e.g., operations in the Chilean Andes) have high returns and all of the stocks without the attribute have low returns. How many stocks does Alice have to follow in order for her to spot the sorting rule—i.e., the non-zero entry in ?
It turns out that Alice only needs to examine stocks so long as she gets to pick exactly which ones:
- Stock : Has attributes , , ,
- Stock : Has attributes , , ,
- Stock : Has attributes , , ,
The fact that Alice can identify the correct attribute even though she has fewer observations than possible attributes, , is known as compressive sensing and was introduced by Candes and Tao (2005) and Donoho (2006). See Terry Tao’s blog post for an excellent introduction. For example, suppose that only the first stock had high returns of :
then Alice can be sure that Charlie has been sorting using the first of the stock attributes. The interesting part is Alice can’t identify Charlie’s strategy using any less than stocks since:
e.g., stocks gives Alice just enough combinations to answer yes or no questions.
What’s more, this result generalizes to the case where the data matrix, , is stochastic rather than deterministic. i.e., in real life Alice can’t decide how many oil and gas stocks with each attribute are traded each period in order to make it easiest to decipher Charlie’s trading strategy. Donoho and Tanner (2009) show that in a world where is a random matrix with Gaussian entries, , there is a maximum number of predictors, , above which it is impossible for Alice to spot relevant attributes from among possibilities using only stocks given by:
and is summarized in the figure below replicated from Donoho and Stodden (2006). The -axis runs from to and gives values for summarizing the relative amount of data available to Alice. The -axis also runs from to and gives values for summarizing the level of sparsity in the model. The underlying model is:
where , is zero everywhere except for entries which are , and each with columns normalized to unit length. The forward stepwise regression procedure enters variables into the model in a sequential fashion, according to greatest -statistic value. The procedure iteratively takes the single regressor with the highest -statistic until reaching the threshold (i.e., the Bonferroni threshold) which is roughly when . The threshold given by Donoho and Tanner (2009) then corresponds to the white diagonal line cutting through the phase space above which linear regression procedure fails and below which it succeeds.
The interesting part about this result is that this bound on comes from a deep theorem in high-dimensional geometry which relates both compressive sensing and error correcting code as suggested by the deterministic example above. It is not due to any knitty gritty details of Alice’s search problem. Notice how the original bound in the and example has an information theoretic interpretation! Thus, Charlie can hide behind the sheer number of possible explanations in the cross section in the same way that Bill can hide behind the sheer number of observations in the time series.
The speed at which traders interact has greatly increased over the past decade. e.g., Spread Networks invested approximately in a new fiber optic cable linking New York and Chicago via the straightest possible route saving about miles and shaving off their delay. Table 5 in Pagnotta and Philippon (2012) documents the many investments in speed made by exchanges around the world. What’s more, trading behavior at this time scale seems to be decoupled from asset fundamentals. i.e., it’s unlikely that a stock’s value truly follows any of the patterns found in one of Nanex’s crop-circle-of-the-day plots. Motivated by events such as the flash crash there has been a great deal of discussion in recent years about the impact of high frequency trading on asset prices and welfare.
However, the rough calculations above suggest that traders with a monthly investment horizon might not even care about second-to-second fluctuations in asset prices. e.g., think of how high and low frequency bands of the same radio wave can carry rock and classical music to your FM radio receiver without interfering with one another. High frequency trading may be revealing nothing about the fundamental value of the companies in the market place, but just because these traders make short-run returns behave strangely doesn’t mean that they will ruin the market for institutional investors trading at a longer horizon. In this light, perhaps the canonical Euler equation needs to have some additional input parameters, , and :
which define the range over which the theory is effective?