Investigation Bandwidth

1. Motivation

Time is dimensionless in modern asset pricing theory. e.g., the canonical Euler equation:

(1)   \begin{align*} P_t &= \widetilde{\mathrm{E}}_t[ \, P_{t+1} + D_{t+1} \, ] \end{align*}

says that the price of an asset at time t (i.e., P_t) is equal to the risk-adjusted expectation at time t (i.e., \widetilde{E}_t[\cdot]) of the price of the asset at time t+1 plus the risk-adjusted expectation of any dividends paid out by the asset at time t+1 (i.e., P_{t+1} + D_{t+1}). Yet, the theory never answers the question: “Plus 1 what?” Should we be thinking about seconds? Hours? Days? Years? Centuries? Millennia?

Why does this matter? An algorithmic trader adjusting his position each second worries about different risks than Warren Buffett who has a median holding period of decades. e.g., Buffett studies cash flows, dividends, and business plans. By contrast, the probability that a firm paying out a quarterly dividend happens to pay its dividend during any randomly chosen 1 second time interval is \sfrac{1}{1814400}. i.e., roughly the odds of picking a year at random since the time that the human and chimpanzee evolutionary lines diverged. Thus, if an algorithmic trader and Warren Buffett both looked at the exact same stock at the exact same time, then they would have to use different risk-adjusted expectations operators:

(2)   \begin{align*} P_t &= \begin{cases}  \widetilde{\mathrm{E}}^{\text{Alg}}_t[ \, P_{t+1{\scriptscriptstyle \mathrm{sec}}} \, ] &\text{from algorithmic trader's p.o.v.} \\ \widetilde{\mathrm{E}}^{\text{WB}}_t[ \, P_{t+1{\scriptscriptstyle \mathrm{qtr}}} + D_{t+1{\scriptscriptstyle \mathrm{qtr}}} \, ] &\text{from Warren Buffett's p.o.v.} \end{cases} \end{align*}

This note gives a simple economic model in which traders endogenously specialize in looking for information at a particular time scale and ignore predictability at vastly different time scales.

2. Simulation

I start with a simple numerical simulation that illustrates why traders at the daily horizon will ignore price patterns at vastly different frequencies. Suppose that the Cisco’s stock returns are composed of a constant growth rate \mu = \sfrac{0.04}{(480 \cdot 252)}, a daily wobble \beta \cdot \sin(2 \cdot \pi \cdot t) with \beta = \sfrac{1}{(480 \cdot 252)}, and a white noise term \epsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma^2) with \sigma= \sfrac{0.12}{\sqrt{480 \cdot 252}}:

(3)   \begin{align*} R_t &= \mu + \beta \cdot \sin(2 \cdot \pi \cdot t) + \epsilon_t, \quad \text{for} \quad t = \sfrac{1}{480}, \sfrac{2}{480}, \ldots, \sfrac{10079}{480}, \sfrac{10080}{480} \end{align*}

I consider a world where the clock ticks forward in 1 minute increments so that each tick represents \sfrac{1}{480}th of a trading day. The figure below shows a single sample path of Cisco’s return process over the course of a month.

plot--daily-wobble-plus-noise

What are the properties of this return process? First, the constant growth rate, \mu = \sfrac{0.04}{(480 \cdot 252)}, implies that Cisco has a 4{\scriptstyle \%} per year return on average. Second, the volatility of the noise component, \sigma= \sfrac{0.12}{\sqrt{480 \cdot 252}}, implies that the annualized volatility of Cisco’s returns is 12{\scriptstyle \%/\sqrt{\mathrm{Yr}}}. Finally, since:

(4)   \begin{align*} \frac{1}{2 \cdot \pi} \cdot \int_0^{2 \cdot \pi} [\sin(x)]^2 \cdot dx &= 1 \end{align*}

the choice of \beta = \sfrac{1}{(480 \cdot 252)} means that (in a world with a 0{\scriptstyle \%} riskless rate) a trading strategy which is long Cisco stock in the morning and short Cisco stock in the afternoon will generate a 100{\scriptstyle \%} return over the course of 1 year. i.e., this is a big daily wobble! If you start with a \mathdollar 1 on the morning of January 1st you end up with \mathdollar 2 on the evening of December 31st on average by following this trading strategy. The figure below confirms this math by simulating 100 year long realizations of this trading strategy’s returns.

plot--cum-trading-strategy-returns

3. Trader’s Problem

Suppose you didn’t know the exact frequency of the wobble in Cisco’s returns. The wobble is equally likely to have a frequency of anywhere from \sfrac{1}{252} cycles per day to 480 cycles per day. Using the last month’s worth of data, suppose you estimated the regressions specified below:

(5)   \begin{align*} R_t &= \hat{\mu} + \hat{\beta} \cdot \sin(2 \cdot \pi \cdot f \cdot t) + \hat{\gamma} \cdot \cos(2 \cdot \pi \cdot f \cdot t) + \hat{\epsilon}_t \quad \text{for each} \quad \sfrac{1}{252} < f < 480 \end{align*}

and identified the frequency, f_{\min}, which best fit the data:

(6)   \begin{align*} f_{\min} &= \arg \min_{\sfrac{1}{252} < f < 480} \left\{ \, \hat{\sigma}(f) \, \right\} \end{align*}

The figure below shows the empirical distribution of these best in-sample fit frequencies when the true frequency is a daily wobble. The figure reads: “A month’s worth of Cisco’s minute-by-minute returns best fits a factor with a frequency of \sfrac{1}{1.01{\scriptstyle \mathrm{days}}} about 2{\scriptstyle \%} of the time when the true frequency is 1 cycle a day.”

plot--best-in-sample-fit-freq

Suppose that you notice a wobble with a frequency of \sfrac{1}{1.01{\scriptstyle \mathrm{days}}} fit Cisco’s returns over the last month really well, but you also know that this is a noisy in-sample estimate. The true wobble could have a different frequency. If you can expend some cognitive effort to investigate alternate frequencies, how wide a bandwidth of frequencies should you investigate? Here’s where things get interesting. The figure above essentially says that you should never investigate frequencies outside of f_{\min} \pm 0.5 \cdot \sfrac{1}{21}—i.e., plus or minus half the width of the bell. The probability that a pattern in returns with a frequency outside this range is actually driving the results is nil!

4. Costs and Benefits

Again, suppose you’re a trader whose noticed that there is a daily wobble in Cisco’s returns over the past month. i.e., using the past month’s data, you’ve estimated f_{\min} = \sfrac{1}{1{\scriptstyle \mathrm{day}}}. Just as before, it’s a big wobble. Implemented at the right time scale, f_\star, you know that this strategy of buying early and selling late will generate a R(f_\star) = 100{\scriptstyle \%/\mathrm{yr}} = 8.33{\scriptstyle \%/\mathrm{mon}} return. Nevertheless, you also know that f_{\min} isn’t necessarily the right frequency to invest in just because it had the lowest in-sample error over the last month. You don’t want to go to your MD and pitch a strategy only to have adjust it a month later due to poor performance. Let’s say that is costs you \kappa dollars to investigate a range of \delta frequencies. If you investigate a particular range and f_\star is there, then you will discover f_\star with probability 1.

The question is then: “Which frequency buckets should you investigate?” First, are we losing anything by only searching \delta-sized increments. Well, we can tile the entire frequency range with little tiny \delta increments as follows:

(7)   \begin{align*} 1 - \Delta(x,N) &= \sum_{n=0}^{N-1} \mathrm{Pr}\left[ \, x + n \cdot \delta \leq f_\star < x + (n + 1) \cdot \delta \, \middle| \, f_{\min} \, \right]  \end{align*}

i.e., starting at frequency x we can iteratively add N different increments of size \delta. If we start at a small enough frequency, x, and add enough increments, N, then we can tile as much of the entire domain as we like so that \Delta(x,N) is as small as we like.

Next, what are the benefits of discovering the correct time scale to invest in? If R(f_{\star}) denotes the returns to investing in a trading strategy at the correct time scale over the course of the next month, let:

(8)   \begin{align*} \mathrm{Corr}[R(f_{\star}),R(f_{\min})] &= C(f_{\star},f_{\min}) \end{align*}

denote the correlation between the returns of the strategy at the true frequency and the strategy at the best in-sample fit frequency. We know that C(f_{\star},f_{\star}) = 1 and that:

(9)   \begin{align*} \frac{dC(f_{\star},f_{\min})}{d|\log f_{\star} - \log f_{\min}|} < 0 \qquad \text{with} \qquad \lim_{|\log f_{\star} - \log f_{\min}| \to \infty} R(f_{\min}) = 0 \end{align*}

i.e., as f_{\min} gets farther and farther away from f_{\star}, your realized returns over the next month from a trading strategy implemented at horizon f_{\min} will become less and less correlated with the returns of the strategy implemented at f_{\star} and as a consequence shrink to 0. Thus, the benefit to discovering that the true frequency was not f_{\min} is given by (1 - C(f_\star,f_{\min})) \cdot R(f_{\star}).

Putting the pieces together, it’s clear that you should investigate a particular range of frequencies for a confounding explanation if the expected probability of finding f_{\star} there given the realized f_{\min} times the benefit of discovering the true f_{\star} in that range exceeds the search cost \kappa:

(10)   \begin{align*} \kappa &\leq \underbrace{\mathrm{Pr}\left[ \, x + n \cdot \delta \leq f_\star < x + (n + 1) \cdot \delta \, \middle| \, f_{\min} \, \right]}_{\substack{\text{Probability of finding $f_\star$ in a } \\ \text{particular range given observed $f_{\min}$.}}} \cdot \overbrace{(1 - C(f_\star,f_{\min})) \cdot R(f_{\star})}^{\substack{\text{Benefit of} \\ \text{discovery}}} \end{align*}

i.e., you’ll have a donut shaped search pattern around f_{\min}. You won’t investigate frequencies that are really different from f_{\min} since the probability of finding f_{\star} there will be too low to justify the search costs. By contrast you won’t investigate frequencies that are too similar to f_{\min} since the benefits to discovering this minuscule error don’t justify the costs even though such tiny errors may be quite likely.

5. Wrapping Up

I started with the question: “How can it be that an algorithmic trader and Warren Buffett worry about different patterns in the same price path?” In the analysis above I give one possible answer. If you see a tradable anomaly at a particular time scale (e.g., 1 wobble per day) over the past month, then the probability that this anomaly was caused by a data generating process with a much shorter or much longer frequency is essentially 0. I used only sine wave plus noise processes above, but it seems like this assumption can be easily relaxed via results from, say, Friedlin and Wentzell.