The Law of Small Numbers

1. Introduction

The “law of small numbers” is the name given to the well documented empirical regularity that people tend to overinfer from small samples in Tversky and Kahneman (1971). This post discusses a few of the results from Rabin (2002) which applies the law of small numbers to the beliefs of stock market traders. This paper is particularly nice because it captures this behavioral bias and its many interesting implications using only a small tweak to a simple Bayesian learning problem.

This post contains two parts: First, in Section 2, I characterize the biased beliefs of a trader who is suffering from the law of small numbers. For brevity, I refer to this traders as Bob in the text below. Then, in Section 3, I show how returns in a market populated by Bobs would display excess volatility.

2. The Core Idea

First, I define our hero’s problem. Suppose that Bob watches a sequence of signals s_t \in \{a,b\} for t = \{1, 2,\ldots\}. The signal Bob sees s_t = a each period is an iid draw from a binomial distribution with intensity, \theta:

(1)   \begin{align*} \mathtt{Pr}[s_t=a] &= \theta, \qquad \theta \in [0,1] \end{align*}

There are a finite number of possible \theta‘s and Bob doesn’t know which \theta governs the stream of signals he observes. Let \Theta denote the set of all rates that could occur with positive probability, \pi(\theta) > 0, so that \sum_{\theta \in \Theta}\pi(\theta) = 1. Bob’s challenge is to infer which \theta is governing the string of signals he is observing.

Next, I define Bob’s inference strategy in light of his bias due to the law of small numbers. Suppose that he has correct beliefs about the distribution of \pi‘s and is fully Bayesian; however, he believes that there is some positive integer N such that signals are drawn without replacement from an urn containing \theta \cdot N signals of s_t = a and (1 - \theta) \cdot N signals of s_t = b. Finally, so that the game does not end after N periods, Bob thinks that this urn is refilled every two draws. Thus, while odd and even draws are correlated, pairs of draws are iid.

In order for this inference strategy to be well defined, it has to be the case that Bob believes there is some \theta \in \Theta such that there are at least two a and b signals that can be drawn at each point in time. Thus, there exists \theta \in \theta such that:

(2)   \begin{align*} \min \left\{ \theta \cdot N, (1 - \theta) \cdot N \right\} &\geq 2 \end{align*}

implying that N \geq 4. Let \pi_t^N(h_t) represents Bob’s posterior beliefs about the probability of each \theta \in \Theta governing his string of signals after a history of signals h_t = \{s_1, s_2,\ldots,s_t\} given that he is a type-N sufferer of the law of small numbers. As a clarifying example, note that \pi_t^\infty(h_t) beliefs represent the beliefs of a fully rational agent. In the text below, I will can this fully rational agent Alice for concreteness.

With his problem and inference strategy in place, I now prove two results characterizing Bob’s beliefs. I first compute Bob’s beliefs immediately after seeing either a or b for a signal s_t on an odd period:

Proposition: For all N, \pi and \theta:

(3)   \begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \\ \pi_1^N(\theta|s_1=b) &= \frac{(1 - \theta) \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} (1 - \theta') \cdot \pi(\theta')} \end{align*}

so that both \pi_1^N(s_2 = a|s_1 = a) and \pi_1^N(s_2 = b|s_1=b) are increasing in N.

Proof: The expressions for \pi_1^N(\theta|s_1=a) and \pi_1^N(\theta|s_1=b) follow immediately from Bayes’ rule as, for example:

(4)   \begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\mathtt{Pr}[s_1 = a|\theta] \cdot \mathtt{Pr}[\theta]}{\mathtt{Pr}[s_1 = a]} \\ &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \end{align*}

The fact that \pi_1^N(s_2 = a|s_1 = a) is increasing in N follows from a Markov clever rewriting:

(5)   \begin{align*} \pi_1^N(s_2=a|s_1=a) &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1=a) \cdot \pi_1^N(s_2=a|\theta,s_1=a) \\ &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1 = a) \cdot \left( \frac{\theta \cdot N - 1}{N-1} \right) \end{align*}

\pi_1^N(s_2=a|\theta,s_1=a) = (\theta \cdot N - 1)/(N-1) follows from the fact that Bob believes the signals are drawn from an urn N signals deep without replacement where one signal has already been removed. Since \pi_1^N(\theta|s_1=a) is independent of N and (\theta \cdot N - 1)/(N-1) is increasing in N then \pi_1^N(s_2=a|s_1=a) is increasing in N. The result for \pi_1^N(s_2 = b|s_1 = b) follows from symmetry.

There are two interesting features of this result. First, note that Bob’s beliefs are identical to an agent with proper Bayesian beliefs in the first period. Second, because he believes that the signals are draw from an urn without replacement, Bob underestimates drawing two a‘s in a row or two b‘s in a row in a manner that decreases in the size of the urn.

Next, I characterize Bob’s posterior beliefs about two different \theta‘s given an extreme set of signals:

Proposition: Let h_t^a be a history of a signals and let h_t^b be a history of b signals. For all t > 1 and \theta, \theta' \in \Theta such that \theta > \theta', both \pi_t^N(\theta|h_t^a)/\pi_t^N(\theta'|h_t^a) and \pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b) are strictly decreasing in N.

Proof: For even t, note that:

(6)   \begin{align*} \frac{\pi_t^N(\theta|h_t^a)}{\pi_t^N(\theta'|h_t^a)} &= \left( \frac{\theta \cdot (\theta \cdot N - 1)}{\theta' \cdot (\theta' \cdot N - 1)} \right)^{\frac{t}{2}} \end{align*}

Thus, this ratio is decreasing if and only if \theta > \theta'. Extending the argument to odd values of t only changes the counting convention and symmetry yields the same result for \pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b).

This proposition implies that, following an extreme sequence of signals, Bob overinfers that he is facing an extreme rate. Intuitively, if Bob thinks that the signals are drawn from an urn without replacement, then he is too surprised when he sees extreme signals because once a signal of s_t = a has been drawn in an odd period he believes that same signal cannot be drawn again in the following even period.

3. Excess Volatility

I now apply this reasoning to the behavior of returns in a market populated by Bobs. First, I describe the assets. Consider a market with countably infinitely many stocks indexed by i \in \{1,2,\ldots\}. Each month, every stock realizes either a positive or negative return denoted by r_{i,t} = a for positive returns or r_{i,t} = b for negative returns which is drawn iid from a binomial distribution with parameter \theta_i \in [0,1]. Thus, in this market, positive returns for stock i today do not in fact predict positive returns tomorrow or vice versa. Suppose that a fraction \phi(1/2) = 5/7 of the stocks have \theta_i = 1/2, a fraction \phi(0) = 1/7 of the stocks have \theta_i = 0 and the remaining fraction \phi(1) = 1/7 of the stocks have \theta_i = 1.

Next, I describe the trading strategy of the Bobs which I index with j \in \{1,2,\ldots\}. Let z_t^{(j)} denote the list of stocks not chosen by Bob j from month 0 up to but not including month t. Each Bob then adheres to the following trading strategy:

  1. At month t=0, Bob j picks one stock i at random and holds onto one share for the next four months, t \in \{1,2,3,4\}.
  2. Then, in month t=4, Bob j sells this share and picks a new stock at random at random from z_4^{(j)}. He buys a shares and holds onto it for the next four months, t \in \{5,6,7,8\}.
  3. Then, in month t=8, Bob j sells this share and picks a new stock at random at random from z_8^{(j)}. He buys a shares and holds onto it for the next four months, t \in \{9,10,11,12\}.
  4. And so on\ldots

Thus, via the law of large numbers, each stock will have the same number of Bobs holding is at each point in time with exactly 1/4 of the Bobs exchanging the stock for another each period.

I now consider the average beliefs of traders in a market populated by Bobs who suffer from the law of large numbers. First, I compute the probability that Bayesian traders and traders suffering from the law of small numbers believe that stock i‘s return parameter is \theta_i = 1 after observing different strings of returns in the left two columns of the table below. Then, I compute the probability that these two types of traders beliefs that the next return will be r_{i,t}=a given these previous return realizations in the right two columns of the same table.

Rendered by

Consistent with the second proposition in Section 2 above, note that the Bobs overestimate the probability that an asset’s returns are generated by the parameter \theta_i=1 following a string of positive returns. Next, in the table below, I conclude by computing the average belief about the probability that r_{i,t}=a among both Bayesian traders (i.e., Alices) and traders suffering from the law of small numbers (i.e., Bobs) computed over the four groups of traders who have seen no signals, one signal, two signals and three signals for asset i respectively. Again, this table reveals that for extremely positive return histories, the Bobs overinfer the probability of \theta_i = 1 and thus r_{i,t}=a; however, for more balanced histories the Bobs underestimate the probability that r_{i,t} = a relative to the Bayesian Alices.

Rendered by

Thus, if all traders were Bobs, they would overreact to strings of positive returns and generate excess volatility.