The Law of Small Numbers

1. Introduction

The “law of small numbers” is the name given to the well documented empirical regularity that people tend to overinfer from small samples in Tversky and Kahneman (1971). This post discusses a few of the results from Rabin (2002) which applies the law of small numbers to the beliefs of stock market traders. This paper is particularly nice because it captures this behavioral bias and its many interesting implications using only a small tweak to a simple Bayesian learning problem.

This post contains two parts: First, in Section 2, I characterize the biased beliefs of a trader who is suffering from the law of small numbers. For brevity, I refer to this traders as Bob in the text below. Then, in Section 3, I show how returns in a market populated by Bobs would display excess volatility.

2. The Core Idea

First, I define our hero’s problem. Suppose that Bob watches a sequence of signals s_t \in \{a,b\} for t = \{1, 2,\ldots\}. The signal Bob sees s_t = a each period is an iid draw from a binomial distribution with intensity, \theta:

(1)   \begin{align*} \mathtt{Pr}[s_t=a] &= \theta, \qquad \theta \in [0,1] \end{align*}

There are a finite number of possible \theta‘s and Bob doesn’t know which \theta governs the stream of signals he observes. Let \Theta denote the set of all rates that could occur with positive probability, \pi(\theta) > 0, so that \sum_{\theta \in \Theta}\pi(\theta) = 1. Bob’s challenge is to infer which \theta is governing the string of signals he is observing.

Next, I define Bob’s inference strategy in light of his bias due to the law of small numbers. Suppose that he has correct beliefs about the distribution of \pi‘s and is fully Bayesian; however, he believes that there is some positive integer N such that signals are drawn without replacement from an urn containing \theta \cdot N signals of s_t = a and (1 - \theta) \cdot N signals of s_t = b. Finally, so that the game does not end after N periods, Bob thinks that this urn is refilled every two draws. Thus, while odd and even draws are correlated, pairs of draws are iid.

In order for this inference strategy to be well defined, it has to be the case that Bob believes there is some \theta \in \Theta such that there are at least two a and b signals that can be drawn at each point in time. Thus, there exists \theta \in \theta such that:

(2)   \begin{align*} \min \left\{ \theta \cdot N, (1 - \theta) \cdot N \right\} &\geq 2 \end{align*}

implying that N \geq 4. Let \pi_t^N(h_t) represents Bob’s posterior beliefs about the probability of each \theta \in \Theta governing his string of signals after a history of signals h_t = \{s_1, s_2,\ldots,s_t\} given that he is a type-N sufferer of the law of small numbers. As a clarifying example, note that \pi_t^\infty(h_t) beliefs represent the beliefs of a fully rational agent. In the text below, I will can this fully rational agent Alice for concreteness.

With his problem and inference strategy in place, I now prove two results characterizing Bob’s beliefs. I first compute Bob’s beliefs immediately after seeing either a or b for a signal s_t on an odd period:

Proposition: For all N, \pi and \theta:

(3)   \begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \\ \pi_1^N(\theta|s_1=b) &= \frac{(1 - \theta) \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} (1 - \theta') \cdot \pi(\theta')} \end{align*}

so that both \pi_1^N(s_2 = a|s_1 = a) and \pi_1^N(s_2 = b|s_1=b) are increasing in N.

Proof: The expressions for \pi_1^N(\theta|s_1=a) and \pi_1^N(\theta|s_1=b) follow immediately from Bayes’ rule as, for example:

(4)   \begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\mathtt{Pr}[s_1 = a|\theta] \cdot \mathtt{Pr}[\theta]}{\mathtt{Pr}[s_1 = a]} \\ &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \end{align*}

The fact that \pi_1^N(s_2 = a|s_1 = a) is increasing in N follows from a Markov clever rewriting:

(5)   \begin{align*} \pi_1^N(s_2=a|s_1=a) &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1=a) \cdot \pi_1^N(s_2=a|\theta,s_1=a) \\ &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1 = a) \cdot \left( \frac{\theta \cdot N - 1}{N-1} \right) \end{align*}

\pi_1^N(s_2=a|\theta,s_1=a) = (\theta \cdot N - 1)/(N-1) follows from the fact that Bob believes the signals are drawn from an urn N signals deep without replacement where one signal has already been removed. Since \pi_1^N(\theta|s_1=a) is independent of N and (\theta \cdot N - 1)/(N-1) is increasing in N then \pi_1^N(s_2=a|s_1=a) is increasing in N. The result for \pi_1^N(s_2 = b|s_1 = b) follows from symmetry.

There are two interesting features of this result. First, note that Bob’s beliefs are identical to an agent with proper Bayesian beliefs in the first period. Second, because he believes that the signals are draw from an urn without replacement, Bob underestimates drawing two a‘s in a row or two b‘s in a row in a manner that decreases in the size of the urn.

Next, I characterize Bob’s posterior beliefs about two different \theta‘s given an extreme set of signals:

Proposition: Let h_t^a be a history of a signals and let h_t^b be a history of b signals. For all t > 1 and \theta, \theta' \in \Theta such that \theta > \theta', both \pi_t^N(\theta|h_t^a)/\pi_t^N(\theta'|h_t^a) and \pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b) are strictly decreasing in N.

Proof: For even t, note that:

(6)   \begin{align*} \frac{\pi_t^N(\theta|h_t^a)}{\pi_t^N(\theta'|h_t^a)} &= \left( \frac{\theta \cdot (\theta \cdot N - 1)}{\theta' \cdot (\theta' \cdot N - 1)} \right)^{\frac{t}{2}} \end{align*}

Thus, this ratio is decreasing if and only if \theta > \theta'. Extending the argument to odd values of t only changes the counting convention and symmetry yields the same result for \pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b).

This proposition implies that, following an extreme sequence of signals, Bob overinfers that he is facing an extreme rate. Intuitively, if Bob thinks that the signals are drawn from an urn without replacement, then he is too surprised when he sees extreme signals because once a signal of s_t = a has been drawn in an odd period he believes that same signal cannot be drawn again in the following even period.

3. Excess Volatility

I now apply this reasoning to the behavior of returns in a market populated by Bobs. First, I describe the assets. Consider a market with countably infinitely many stocks indexed by i \in \{1,2,\ldots\}. Each month, every stock realizes either a positive or negative return denoted by r_{i,t} = a for positive returns or r_{i,t} = b for negative returns which is drawn iid from a binomial distribution with parameter \theta_i \in [0,1]. Thus, in this market, positive returns for stock i today do not in fact predict positive returns tomorrow or vice versa. Suppose that a fraction \phi(1/2) = 5/7 of the stocks have \theta_i = 1/2, a fraction \phi(0) = 1/7 of the stocks have \theta_i = 0 and the remaining fraction \phi(1) = 1/7 of the stocks have \theta_i = 1.

Next, I describe the trading strategy of the Bobs which I index with j \in \{1,2,\ldots\}. Let z_t^{(j)} denote the list of stocks not chosen by Bob j from month 0 up to but not including month t. Each Bob then adheres to the following trading strategy:

  1. At month t=0, Bob j picks one stock i at random and holds onto one share for the next four months, t \in \{1,2,3,4\}.
  2. Then, in month t=4, Bob j sells this share and picks a new stock at random at random from z_4^{(j)}. He buys a shares and holds onto it for the next four months, t \in \{5,6,7,8\}.
  3. Then, in month t=8, Bob j sells this share and picks a new stock at random at random from z_8^{(j)}. He buys a shares and holds onto it for the next four months, t \in \{9,10,11,12\}.
  4. And so on\ldots

Thus, via the law of large numbers, each stock will have the same number of Bobs holding is at each point in time with exactly 1/4 of the Bobs exchanging the stock for another each period.

I now consider the average beliefs of traders in a market populated by Bobs who suffer from the law of large numbers. First, I compute the probability that Bayesian traders and traders suffering from the law of small numbers believe that stock i‘s return parameter is \theta_i = 1 after observing different strings of returns in the left two columns of the table below. Then, I compute the probability that these two types of traders beliefs that the next return will be r_{i,t}=a given these previous return realizations in the right two columns of the same table.

Rendered by QuickLaTeX.com

Consistent with the second proposition in Section 2 above, note that the Bobs overestimate the probability that an asset’s returns are generated by the parameter \theta_i=1 following a string of positive returns. Next, in the table below, I conclude by computing the average belief about the probability that r_{i,t}=a among both Bayesian traders (i.e., Alices) and traders suffering from the law of small numbers (i.e., Bobs) computed over the four groups of traders who have seen no signals, one signal, two signals and three signals for asset i respectively. Again, this table reveals that for extremely positive return histories, the Bobs overinfer the probability of \theta_i = 1 and thus r_{i,t}=a; however, for more balanced histories the Bobs underestimate the probability that r_{i,t} = a relative to the Bayesian Alices.

Rendered by QuickLaTeX.com

Thus, if all traders were Bobs, they would overreact to strings of positive returns and generate excess volatility.

Arora et al. (2012)

1. Introduction

In this post I work through the main result in Arora, Barak, Brunnermeier and Ge (2012) which formulates a simple securitization setting with I underlying assets (e.g., mortgages) and J derivative assets (e.g., CDOs) where:

  1. The seller can cheat and create non-random derivatives that will generate lemons cost for the buyer, and…
  2. With limited computational power, the buyer will not be able to detect whether or not the seller has cheated.

For a finance-lite discussion of the results, see Dick Lipton’s excellent blog post.

2. Why Use Derivatives?

I begin by walking through the motivation for using derivatives in the first place via a simple model in the spirit of DeMarzo (2005). In this sort of model, derivatives like CDOs solve an asymmetric information problem between a seller (e.g., a investment bank) and a buyer (e.g., a pension fund).

Suppose that a securities seller owns I \gg 1 mortgages which each payout \$y_i defined as:

(1)   \begin{align*} y_i &= x_i + z_i, \quad i \in \mathcal{I} \end{align*}

where x_i is an iid random variable such that x_i = \$\bar{x} with probability \rho and x_i = \$0 with probability 1-\rho while z_i is Gaussian white noise with distribution \mathtt{N}(0,\sigma_z^2). For simplicity, I set \bar{x} = \$1 and \rho = 1/2 in the analysis below.

State space tree for x.

The model relies on two main assumptions:

  1. The seller knows \$x_i for each i \in \mathcal{I} while the buyer does not.
  2. The seller would prefer \$\delta in cash to \$1 in mortgages with \delta \in (0,1).

The first assumption defines the information asymmetry. To make the problem interesting, it also has to be the case that the seller cannot credibly reveal his information prior to trading. e.g., he can’t just point out which mortgages are bad apples in an email. The second assumption gives the seller a good faith reason to get rid of the mortgages. This assumption can be interpreted as saying that the seller is less liquid than the buyer and would prefer to have more cash.

If a buyer considered purchasing a “random” collection of D < I/2 mortgages from the seller, he would be worried that the seller would put all of the bad mortgages in this collection so that the buyer would receive a collection of mortgages worth \$0 rather than a “fair value” of \$D/2. In such a world, the buyer would never purchase mortgages from the seller and the seller would be forced to hold onto all of the mortgages which he values at only \$\delta \cdot I / 2 due to his liquidity concerns. This loss of \$(1 - \delta) \cdot I / 2 is the core of the asymmetric information problem.

However, a smart seller can eliminate the asymmetric information problem entirely via careful security design. Suppose that he (a) offers to sell a security with two tranches where the senior tranche has a face value of \$I/2 and (b) holds onto the junior tranche. After agreeing to accept the first \$I/2 in losses by holding onto the junior tranche, the seller essentially converts the remaining payout from the the bundle of mortgages into a riskless asset as I \nearrow \infty since the white noise terms z_i cancel out on average. Thus, by securitizing the mortgages the seller is able to get the “fair” value of his assets in spite of the asymmetric information problem:

(2)   \begin{align*} (1/2) \cdot \sum_{i=1}^I x_i &= I/2 = \mathtt{E} \left[ \sum_{i=1}^I y_i \right] \end{align*}

3. New Framework

In Arora et al. (2012), the seller still wants to sell I mortgages which each payoff \bar{x} = 1 with probability \rho = 1/2. However, Arora et al. (2012) adjust this standard framework in three key ways:

  1. The seller cannot hold the entire junior tranche.
  2. The buyer and seller engage in a zero-sum transaction.
  3. The seller creates J binary derivative securities (henceforth, CDOs) from these I underlying mortgages.

The first assumption means that the seller cannot fix the asymmetric information problem via security design in the same manner as DeMarzo (2012). The second assumption removes any possibility that derivative creation might be socially beneficial. In Arora et al. (2012), the seller gets utility from the amount of money he can extract from the buyer via clever security design. The third and final assumption is the key step that allows computational complexity to enter into model. Specifically, each CDO’s payout now depends on D \leq I of the mortgages in the entire pool. Each of these J new CDOs payout an amount \$V_j where V_j = \$\bar{v} if no more than (D + \phi)/2 of the underlying mortgages default and V_j = \$0 otherwise with:

(3)   \begin{align*} \bar{v} &= \left( \frac{D - \phi}{2 \cdot D} \right) \cdot \frac{I}{J} \end{align*}

Given the second assumption, the seller’s assignment of the I mortgages to J CDOs then defines a bipartite graph as illustrated below:

Seller assignment bipartite graph.

Now, suppose that the seller has private information that a subset \mathcal{L} of the I mortgages are lemons and guaranteed to payout 0. Here, private information means that the seller knows which L mortgages have a payout of \$0 while the buyer is aware that the seller knows this information. To increase his profit, the seller might carefully design this graph using his knowledge of which mortgages are lemons in order to reduce the value of the CDOs relative to their “fair” value of D/2.

Definition (Lemons Cost): Let \kappa_j(L) denote the lemons cost to the buyer if the seller knows a set \mathcal{L} of the mortgages will payout x_i = 0:

(4)   \begin{align*} \kappa_j(L) &= \mathtt{E}\left[ V_j \right] - \underset{\mathcal{L} \subseteq \mathcal{I}}{\mathtt{min}} \left\{ \mathtt{E}\left[ \ V_j \ \middle| \ x_i = 0, \ \forall i \in \mathcal{L} \ \right] \right\} \end{align*}

where \kappa_j has units of dollars.

Thus, the lemons cost can be thought of as the amount by which a CDO’s value could possibly decline if the seller knew that a set of \mathcal{L} lemons existed in the entire mortgage pool and tried to cheat the buyer out of as much money as possible. For example, suppose that in the model from the section above the seller didn’t actually know each and every x_i but instead just knew that a subset \mathcal{L} of the I mortgages would have x_i = \$0. Then, the fair value of the entire bundle of mortgages would have been (I - L)/2 = I/2 - L/2 so that the lemons cost would have been \kappa(L) = \$L/2.

The key insight of this paper is that trying to figure out whether or not the seller has randomly assigned mortgages to CDOs is equivalent to the densest subgraph problem which does not have a polynomial time solution. e.g., if the buyer faces any computational constraints whatsoever, then the seller can foist some lemons costs on the buyer without being detected. More formally, the buyer faces the following problem:

Definition (Densest Subgraph Problem): The densest subgraph problem is to distinguish between the following two distributions, \mathtt{Rnd} and \mathtt{Lmn}, over the bipartite graph above:

  1. \mathtt{Rnd}: The seller chooses D random mortgages for every CDO.
  2. \mathtt{Lmn}: Let nature select a random set of \mathcal{L} assets with x_i = \$0 payout. The seller then selects a subset of \mathcal{B} CDOs (i.e., boobytrapped CDOs) and a number of lemons \theta to plant in each boobytrapped CDO. For every non-boobytrapped CDO, the seller chooses D random mortgages. For every boobytrapped CDO, the seller randomly chooses D - \theta from the total population of mortgages and \theta mortgages from set of \mathcal{L} mortgages with known x_i = \$0.

If I = o(J \cdot D) and (B \cdot \theta^2 / L) = o(J \cdot D^2 / I), then there is no polynomial time algorithm to distinguish between the distributions \mathtt{Rnd} and \mathtt{Lmn}.

Roughly speaking, the densest subgraph problem is tantamount to figuring out whether or not the seller planted \theta mortgages with no payout in B of the J CDOs he issued. The seller then has preferences for maximizing the total amount of lemons costs he can extract from the buyer, subject to the constraint that there is no polynomial time algorithm that the buyer can use to detect his cheating. Note that in the simple DeMarzo (2012) setting, if the seller could commit to issuing a CDO with a “random” collection of D mortgages and published the collection of mortgage IDs, the buyer could check whether or not these D mortgage IDs were in fact randomly selected using a polynomial time algorithm. The challenge in the Arora et al. (2012) setting comes from the fact that the same mortgage i could show up in several CDOs.

Above, I use Landau notation:

Definition (Little-o Notation): We say that “f is little-o of h as x approaches x_0” and write f(x) = o(h(x)) as x \to x_0 to mean that:

(5)   \begin{align*} \lim_{x \to x_0} \frac{f(x)}{h(x)} &= 0 \end{align*}

Some simple math reveals that the seller only needs to fix about \theta \sim \sqrt{D} of the assets in each CDO to affect payout as the expected number of defaults would be (D + \theta)/2 while the standard deviation would be (1/2) \cdot \sqrt{D - \theta}. Thus, the lemons cost for a boobytrapped CDO will be \kappa_j(L) = \bar{v} \cdot \Delta_\theta where \Delta_\theta = \mathtt{N}[ (\theta - \phi)/(2 \cdot \sqrt{D})] is the increase in the probability that a CDO does not payout due to the seller’s boobytrap and \mathtt{N}[\cdot] denotes the standard normal distribution. The seller then chooses the number of CDOs to boobytrap, B, and the number of lemons to plant in each boobytrapped CDO, \theta, in order to maximize his utility as defined below:

(6)   \begin{align*} U &= \underset{\{B,\theta\}}{\mathtt{max}} \left\{ \ B \cdot \bar{v} \cdot \Delta_\theta \ \right\} \\ &\text{ s.t. no polynomial time algorithm\ldots} \end{align*}

Since \Delta_\theta is concave for \theta < \phi and convex if \theta \geq \phi, the optimal choice of \theta is either \theta = 0 or \theta = c \cdot \sqrt{D} for some small constant c.

4. Main Result

The main result of the paper is to characterize the seller’s utility U given this optimization program:

Theorem: If \theta - \phi > 3 \cdot \sqrt{D} and \theta/D \gg L/I, the seller will earn a utility:

(7)   \begin{align*}  U &\geq \left(1 - 2 \cdot \mathtt{N}[-\phi/(2 \cdot \sqrt{D})] - o(1) \right) \cdot B \cdot \bar{v} \approx L \cdot \sqrt{I/J} \end{align*}

Thus, if the seller issues roughly the same number of CDOs as there are mortgages, he can each roughly \$1 for each additional lemon that he discovers in the underlying pool of mortgages without the buyer being able to figure out whether or not he has cheated an boobytrapped some of the CDOs.

Proof: The strategy of the proof will be to first take any B and \theta that satisfy the theorem conditions and proceed to compute the lemons cost for a boobytrapped CDO. Then, use the link between the default rate on mortgages and the default rate on CDOs to compute an upper bound on the lemons cost of a non-boobytrapped CDO.

Suppose that B and \theta satisfy the non-polynomial time algorithm bounds. For each boobytrapped CDO j \in \mathcal{B}, let s_j denote the total number of defaulted mortgages. Since the CDO is boobytrapped \theta mortgages will always default, so the expectation \mathtt{E}[s_j] can be computed as:

(8)   \begin{align*} \mathtt{E}[s_j] = \frac{D - \theta}{2} + \theta \end{align*}

Given that the number of assets in each CDO, D, is large enough that the law of large numbers holds, we would then have that:

(9)   \begin{align*} \mathtt{Pr}\left[ \ s_j \geq \left(\frac{D + \phi}{2}\right) \ \right] &= 1 - \mathtt{N}\left[-\frac{\phi}{2 \cdot \sqrt{D}}\right] \end{align*}

Next, I want to derive a bound on the expected payout of a non-boobytrapped CDO. Suppose that the expected number of defaulted assets for each non-boobytrapped CDO is t. We know that t has to satisfy the relationship:

(10)   \begin{align*} B \cdot \left( \frac{D + \theta}{2} \right) + \left( J - B \right) \cdot t &= \left( \frac{I + L}{2 \cdot L} \right) \cdot \frac{J \cdot D}{I} \end{align*}

Both the left and right hand sides of the above equation are the proportion of mortgages that default. The left hand side of the above equation is the proportion of mortgages in the boobytrapped CDOs that default, B \cdot (D + \theta)/2, plus the proportion of mortgages in the non-boobytrapped CDOs that default, (J - B) \cdot t. The right hand side is the number of defaults per securitized mortgage times the average number of times each mortgage is securitized into a CDO. Thus, by solving for t we can derive the inequality below:

(11)   \begin{align*} t &= \left( \frac{I + L}{2 \cdot L} \right) \cdot \frac{J \cdot D}{\left( J - B \right) \cdot I} - \frac{B}{J - B} \cdot \left( \frac{D + \theta}{2} \right) \\ &= \frac{D}{2} \cdot \left( \frac{(I + L) \cdot J}{(J - B) \cdot L \cdot I} \right) - \frac{B \cdot D}{2 \cdot (J - B)} - \frac{B \cdot \theta}{2 \cdot (J - B)} \\ &= \frac{D}{2} \cdot \left( \frac{I\cdot J}{(J - B) \cdot L \cdot I} \right) + \frac{D}{2} \cdot \left( \frac{L \cdot J}{(J - B) \cdot L \cdot I} \right) - \frac{B \cdot D}{2 \cdot (J - B)} - \frac{B \cdot \theta}{2 \cdot (J - B)} \\ &\geq \frac{D}{2} + \frac{L}{2 \cdot I} - \frac{B \cdot \theta}{2 \cdot (J - B)} \end{align*}

Thus, the probability that the number of defaulted assets in a non-boobytrapped CDO is (D+\phi)/2 can be computed as:

(12)   \begin{align*} \mathtt{N}\left[ - 3 - \frac{B \cdot \theta}{2 \cdot (J - B) \cdot D} \right] &= \mathtt{N}\left[ - \frac{\phi}{2 \cdot \sqrt{D}} \right] - \left. \frac{d\mathtt{N}}{dx} \right|_{x = -3} \cdot \left( \frac{B \cdot \theta}{2 \cdot (J - B) \cdot D} \right) \\ &= \mathtt{N}\left[ - \frac{\phi}{2 \cdot \sqrt{D}} \right] - O \left( \frac{B \cdot \theta}{2 \cdot (J - B) \cdot D} \right) \end{align*}

Thus, the expected number of securities (boobytrapped and non-boobytrapped) that yield no payout is at least J \cdot \mathtt{N}[ - \phi/(2 \cdot \sqrt{D})] + (1 - 2 \cdot \mathtt{N}[ - \phi/(2 \cdot \sqrt{D})]) \cdot B - o(B) which is roughly (1 - 2 \cdot \mathtt{N}[ - \phi/(2 \cdot \sqrt{D})]) smaller than the case with no boobytrapping.

Notes: Gabaix (2012)

1. Introduction

In this post, I review the sparsity based model of bounded rationality introduced in Gabaix (2011) and then extended in Gabaix (2012).

In the baseline framework presented in Gabaix (2011), a boundedly rational agent faces the problem of which action to choose in order to maximize his utility where his optimal action may depend on many different variables. The agent then chooses an action by following a 2 step algorithm: first, he chooses a sparse representation of the world whereby he completely ignores many of the possible variables that might affect his optimal action choice; second, he uses this endogenously chosen sparse representation of the world to choose a boundedly rational action. Gabaix (2012) then shows how to use this sparsity based framework to solve a dynamic programming problem.

2. Illustrative Example

In this section, I work through a simple example showing how to build bounded rationality into a decision maker’s problem in a tractable way. I begin by defining the problem. Consider the manager of a car factory, call him Bill, who gets to choose how many cars the factory should produce. Let V(a;\mu,x) denote Bill’s value function in units of \mathtt{utils} given an action a over how many cars to produce:

(1)   \begin{align*} \max_a V(a;\mu,x) &= \min_a \left\{ \ \frac{\gamma}{2} \cdot \left( \ a - \sum_{n=1}^N \mu_n \cdot x_n \right)^2 \ \right\} \end{align*}

Here, the vector x denotes a collection of factors that should enter into Bill’s decision process. For instance, he might worry about last month’s demand, x_1, the current US GDP, x_2, the recent increase in the cost of break pads, x_{30}, and the completion of the new St. Croix River bridge in Minneapolis, MN, x_{500}. Likewise, the vector \mu denotes how much weight the Bill should place on each of the N different elements that might enter into his decision process. Each of the elements \mu_n are in units that convert decision factors into units of \mathtt{cars}. So, for example, \mu_{500} would have units of \mathtt{cars}/\mathtt{bridge \ completions} while \mu_2 would have units of \mathtt{cars}/\$. \gamma is a constant with units of \mathtt{utils}/\mathtt{cars}^2 which balances the equation.

Next, consider Bill’s optimal and possible sub-optimal action choices. If Bill is completely unconstrained and can pick any a whatsoever, he should choose:

(2)   \begin{align*} a(\mu;x) &= \sum_{n=1}^N \mu_n \cdot x_n \end{align*}

where a(\cdot;x) has units of \mathtt{cars}. However, suppose that there were some constraints on Bill’s problem and he could not fully adjust his choice of how many cars to produce in response to every little piece of information in the vector x. Let a(m;x) denote his choice of how many cars to produce where m_n = 0 for most n \in N:

(3)   \begin{align*} a(m;x) &= \sum_{n=1}^N m_n \cdot x_n \end{align*}

For instance, if m_2 \neq 0 but m_{30} = 0 then Bill will adjust the number of cars he produces in response to a change in the US GDP but not in response to a change in the price of break pads. Thus, Bill’s choice of how many cars to produce a can be rewritten as a choice of how much he should weight each potential decision factor x_n.

In order to complete the construction of Bill’s boundedly rational decision problem, I now have to define a loss function for Bill which trades off the benefits of choosing a weighting vector m which is closer to the optimal choice \mu against the cognitive costs thinking about all of the nitty-gritty details of the N dimensional vector of decision factors x. As a benchmark, suppose that there were not cognitive costs. In such a world, Bill would choose a = a(\mu;x) as he would suffer a quadratic loss from deviating from this optimal strategy defined by the function L(m,\mu) below, but no compensating “cognitive” gain from not having to think about how the construction of a bridge in Minneapolis should affect his production decision:

(4)   \begin{align*} L(m,\mu) &= \mathtt{E} \left[ \ V(a(\mu;x); \mu,x) - V(a(m;x); \mu,x) \ \right] \\ &= - \frac{\gamma}{2} \cdot \mathtt{E} \left[ \ \left( \sum_{n=1}^N \left( m_n - \mu_n \right) \cdot x_n \right)^2 \ \right] \\ &= \frac{\gamma}{2} \cdot \sum_{n=1}^N \left( m_n - \mu_n \right)^2 \cdot 1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2 \end{align*}

where the term 1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2 is a place holder which helps balance the units in the equation. One method for incorporating cognitive costs into Bill’s decision problem would be to charge him \kappa units of utility per |m_n|^\alpha unit of emphasis on each decision factor. So, for example, if \alpha = 2 and Bill increased his production by 5 \mathtt{cars}/\$1\mathtt{B} in GDP growth, then Bill would pay a cognitive cost of \kappa \cdot 5^2 where \kappa has units of \mathtt{utils}. Below, I formulate this boundedly rational loss function as a function of \alpha which I denote L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;\alpha):

(5)   \begin{align*} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;\alpha) &= \min_{m \in \mathcal{R}^N} \left\{ \ \frac{\gamma }{2} \cdot \sum_{n=N} (m_n - \mu_n)^2 \cdot 1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2 + \kappa \cdot \sum_{n=1}^N \left( |m_n| \cdot 1_{\scriptscriptstyle \frac{\mathtt{dim}[x_n]}{\mathtt{cars}}} \right)^\alpha \ \right\} \end{align*}

where the term 1_{\scriptscriptstyle \mathtt{dim}[x_n] / \mathtt{cars}} is constant with units \mathtt{dim}[x_n] / \mathtt{cars} in order to balance the equation.

The key idea is that Bill’s decision problem is both convex and sparse only when \alpha = 1. Below, I give 3 examples which illustrate this fact:

Example (Quadratic Complexity Costs): First, consider the quadratic case when \alpha = 2. Then, taking the partial derivative of L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;2) with respect to an arbitrarily chosen dimension n \in N and setting \gamma = 1_{\scriptscriptstyle \mathtt{utils}/\mathtt{cars}^2} for simplicity we the optimality condition:

(6)   \begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;2) \\ &= - (m_n - \mu_n) - 2 \cdot \kappa \cdot m_n \end{align*}

This yields an internal optimum for Bill’s choice of m_n:

(7)   \begin{align*} m_n &= \frac{\mu_n}{1 + 2 \cdot \kappa} \end{align*}

While this is an easy solution to solve for, setting \alpha = 2 doesn’t yield any sparsity as m_n \neq 0 whenever \mu_n \neq 0.

Graphically, in the above example, the boundedly rational agent will choose a weight m_n at the point on the x-axis in the figure below where the slope of the solid pink |m_n|^2 curve is exactly equal to the slope of the dashed black line.

The solid colored lines represent the cognitive costs faced by a boundedly rational agent with kappa = 1 when alpha = 2, 3, 4, 5 respectively. The dashed black line represents the gain to the boundedly rational agent to increasing the weight m_n on a particular decision factor when gamma = 1.

Example (Fixed Complexity Costs): On the other hand, let’s not think about a case where \alpha = 0 with the convention that |m_n|^\alpha = 1_{m_n \neq 0}, then I would want to again set:

(8)   \begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;0) \end{align*}

However, this problem is no longer convex as the costs to increasing m_n a little bit away from 0 will always outweigh the incremental benefits. Thus, I get a solution:

(9)   \begin{align*} m_n &=  \begin{cases} \mu_n &\text{ if } |\mu_n| \geq \sqrt{2 \cdot \kappa} \\ 0 &\text{ else } \end{cases} \end{align*}

While well posed, this non-convex problem is computationally hard to solve in an extremely severe way as the solution strategy expands combinatorially rather than linearly in the number of variables N. e.g., see example 6.4 in Boyd (2004) describing the regression selector problem.

Graphically, we can see the intuition for the problem posed by non-convexity for \alpha \in (0,1) in the figure below. We can see for the blue solid line representing the \alpha = 1/5 case, an incremental increase in m_n: 0 \to \epsilon where \epsilon is just marginally greater than 0 will have a cognitive cost much greater than the benefit; however, for m_n: 0 \to \mathcal{E} where \mathcal{E} \gg 0 the increase in the weighting factor m_n will outweigh the benefit. The \alpha = 0 can be seen as an even more extreme limiting case as the blue line becomes increasingly kinked and eventually becomes a flat line at \sqrt{2}.

The solid colored lines represent the cognitive costs faced by a boundedly rational agent with kappa = 1 when alpha = 1/2, 1/3, 1/4, 1/5 respectively. The dashed black line represents the gain to the boundedly rational agent to increasing the weight m_n on a particular decision factor when gamma = 1.

Looking at the previous 2 examples, we can follow the goldie locks logic and see that there is a particular parameterization of \alpha which yields both sparsity and convexity… namely, \alpha = 1.

Example (Linear Complexity Costs): Finally, consider the case where \alpha = 1. Here, we find that:

(10)   \begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;1) \\ &= - (m_n - \mu_n) - \kappa \cdot \mathtt{sgn} [m_n] \end{align*}

where \mathtt{sgn}[\cdot] denotes the sign operator which returns the sign of a real constant. Thus, we have 3 different options for the optimal choice of m_n:

(11)   \begin{align*} m_n &=  \begin{cases} \mu_n + \kappa &\text{ if } \mu_n \leq - \kappa \\ 0 &\text{ if } |\mu_n| < \kappa \\ \mu_n - \kappa &\text{ if } \mu_n \geq \kappa \end{cases} \end{align*}

Thus, the setting where \alpha = 1 is both analytically tractable and sparse as any variables with |\mu_n| < \kappa will be ignored by the decision maker.

3. Static Problem Formulation

In the above example, I showed how to embed sparsity into Bill’s decision problem using a linear cost function. However, Bill’s problem was extremely simple in the sense that he had a linear-quadratic value function. Can the intuition developed in this simple example be extended to more complicated example with more elaborate utility functions? In the simple example, I found that there was a clean knife-edge result regarding \alpha=1 being the only power coefficient which delivered both sparsity and a tractable solution; however, this result depended on Bill’s utility gain to increasing his weighting m_n on decision factor n being linear. e.g., look at the black dashed line in the 2 figures above. Will this same intuition regarding \alpha = 1 hold for more complicated utility functions?

Gabaix (2011) shows that the intuition indeed holds for a wide variety of utility specification after using an appropriate quadratic approximation of the problem around a reference representation of the world \bar{m} and action \bar{a}. For instance, in Bill’s problem above, this would be like allowing his value function V(a;m,x) to be non-quadratic and then approximating his problem as quadratic around a reference \bar{m} = \begin{bmatrix} \mu_1 & \mu_2 & 0 & \cdots & 0 \end{bmatrix} implying that his default decision is to ignore all variables except for last period’s demand and the current value of GDP and number of cars to produce \bar{a} = 100.

In order to construct this approximation, I need to define 3 objects. First, for an arbitrary value function V(a;m,x) define the partial derivatives V_{a,m} and V_{a,a} as:

(12)   \begin{align*} V_{a,m} &= \left. \frac{\partial^2 V}{\partial a \cdot \partial m}(a;m,x) \right|_{\bar{a},\bar{m}} \\ V_{a,a} &= \left. \frac{\partial^2 V}{(\partial a)^2}(a;m,x) \right|_{\bar{a},\bar{m}} \end{align*}

where V_{a,a} is negative definite implying that V(a;m,x) is strictly concave in the neighborhood of \bar{x}. I then use these 2 matrices to define a weighting matrix \Lambda which captures how much information is lost via the quadratic approximation:

(13)   \begin{align*} \Lambda &= - \mathtt{E} \left[ V_{a,m} V_{a,a}^{-1} V_{a,m} \right] \end{align*}

This matrix \Lambda corresponds to the weighting matrix S used in GMM to differentially interpret the error terms from each of the equations a la Cochrane (2005). For instance, when digesting a vector of errors from a set of pricing equations in GMM, a large s_{i,j} term in the weighting matrix S means that a set of coefficients which produce large errors in equations i and j at the same time will be penalized more heavily. In the quadratic approximation below, \Lambda will be used to differentially interpret the loadings m_n on different decision factors.

Step 1 (Choose Representation): The agent chooses his optimal sparse representation of the world m as the solution to the optimization problem:

(14)   \begin{align*} \min_{m} \left\{ \ \frac{1}{2} \cdot (m - \mu)^{\top} \Lambda (m - \mu) + \kappa(m) \ \right\} \end{align*}

where \kappa(m) captures the cognitive cost of a model and is defined as:

(15)   \begin{align*} \kappa(m) &= \kappa^m \cdot \sum_{n=1}^N |m_n - \bar{m}_n| \cdot \left\Vert V_{m_n,a} \cdot \eta_m \right\Vert \end{align*}

In the formulation above, \left\Vert V_{m_n,a} \cdot \eta_m \right\Vert plays the same role as 1_{\scriptscriptstyle \frac{\mathtt{dim}[x_n]}{\mathtt{cars}}} in the motivating example and simply controls the units and scale of the agent’s choice of representations. In general, \eta_m will not be problem specific in any material sense.

Step 2 (Choose Actions): The agent maximizes over his choice of J actions a:

(16)   \begin{align*} \max_a \left\{ \ V(a;m,x) - \kappa(a) \ \right\} \end{align*}

where the cognitive cost of deviating from the default action is \kappa(a):

(17)   \begin{align*} \kappa(a) &= \kappa^a \cdot \sum_{j=1}^J |a_j - \bar{a}_j| \cdot \left\Vert V_{a_j,m} \cdot \eta_a \right\Vert \end{align*}

In the motivating example, Bill did not face Step 2 of the algorithm at all as his choice of car production levels was unconstrained after choosing a representation of the world m. This second step allows for physical difficulty of adjusting the action given the agent’s understanding of the decision problem. e.g., consider a decision maker who makes a portfolio decision regarding how to allocate his firm’s wealth. After choosing a representation of the world in Step 1, he might look at the facts x through the lens of this representation m and decide that he needs to lengthen the duration of his portfolio; however, since there are many different ways to do this in practice, he may face cognitive costs in executing such a change. Step 2 captures these sort of costs.

In summary, a boundedly rational agent with a preference for sparsity first employs a quadratic approximation of his value function and then uses a linear cost function to price the cognitive cost of the complexity of his model of the world. What’s more, this same linear cognitive cost function can be used in a secondary step to incorporate settings where there are cognitive costs to executing a given action.

4. Application to Dynamic Programming

I conclude by showing how to use these tools to solve for a representative agent’s optimal consumption choice in an infinite horizon economy by treating the terms in a Taylor expansion of the optimal consumption rule around the steady state as increasingly complicated decision factors. Consider a standard, discrete time, consumption based model where a representative agent has power utility with risk aversion parameter \gamma and time preference parameter \beta described by the preferences below:

(18)   \begin{align*} \mathtt{E}\left[ \ \sum_{t=0}^\infty \beta^t \cdot \frac{c_t^{1 - \gamma}}{1-\gamma} \ \right] \end{align*}

Assume for simplicity that there is a single risky asset with return \bar{r} where 1 + \bar{r} = \beta for simplicity. Then, I can write this representative agent’s problem recursively as follows:

(19)   \begin{align*} \overline{V}(w_t) &= U(c_t) + \beta \cdot \mathtt{E}_t \left[ \ V\left( (1 + \bar{r}) \cdot (w_t -c_t) + \bar{y} \right) \ \right] \\ w_{t+1} &= (1 + \bar{r}) \cdot (w_t - c_t) + \bar{y} \end{align*}

where \overline{V}(w_t) is the agent’s value function given wealth level w_t and \bar{y} is the constant endowment rate. In this world, it is easy to derive that the optimal consumption choice is given by:

(20)   \begin{align*} \ln \bar{c}_t &=  \ln \left[ \bar{r} \cdot w_t + \bar{y} \right] - \bar{r} \end{align*}

In the remainder of this example, I assume that even the boundedly rational representative agent can solve this simple problem in closed form. However, when I tweak the problem and allow the true values to be r_t = \bar{r} + \hat{r}_t and y_t = \bar{y} + \hat{y}_t where the idiosyncratic terms \hat{r}_t and \hat{y}_t evolve according to \mathtt{AR}(1) processes below with mean 0 shocks \epsilon_{y,t+1} and \epsilon_{r,t+1}:

(21)   \begin{align*} \hat{y}_{t+1} &= \rho_y \cdot \hat{y}_t + \epsilon_{y,t+1} \\ \hat{r}_{t+1} &= \rho_r \cdot \hat{y}_t + \epsilon_{r,t+1} \end{align*}

the representative agent will want to seek a sparse solution due to cognitive costs. Introducing these idiosyncratic terms yields a more complicated value function V with 2 additional state variables when writing the problem recursively:

(22)   \begin{align*} V(w_t;r_t,y_t) &= u(c_t) + \beta \cdot \mathtt{E}_t \left[ \ V\left( (1 + \bar{r} + \hat{r}_t) \cdot (w -c) + \bar{y} + \hat{y}_t;r_{t+1},y_{t+1} \right) \ \right] \\ w_{t+1} &= (1 + \bar{r} + \hat{r}_t) \cdot (w_t - c_t) + \bar{y} + \hat{y}_t \end{align*}

where there will in general be no closed form solution for the optimal choice of \ln c_t^{\scriptscriptstyle \mathtt{Rat}}.

However, suppose that we took a Taylor expansion of the optimal \ln c_t^{\scriptscriptstyle \mathtt{Rat}} around the benchmark solution to get:

(23)   \begin{align*} \ln c_t^{\scriptscriptstyle \mathtt{Rat}} &= \ln \bar{c}_t + b_y^{\scriptscriptstyle \mathtt{Rat}} \cdot \hat{y}_t + b_r^{\scriptscriptstyle \mathtt{Rat}} \cdot \hat{r}_t + \mathtt{h.o.t.}  \\ b_y^{\scriptscriptstyle \mathtt{Rat}} &= \frac{\bar{r}}{(1 + \bar{r}) \cdot (1 + \bar{r} - \rho_y) \cdot \bar{c}_t} \\ b_r^{\scriptscriptstyle \mathtt{Rat}} &= \frac{\frac{\bar{r} \cdot (w_t - c_t)}{c_t} - \frac{1}{\gamma}}{1 + \bar{r} - \rho_r} \end{align*}

I treat these 2 coefficients asthe boundedly rational representative agent’s \bar{m}. Then, I can define his optimal log consumption choice as:

(24)   \begin{align*} \ln c_t^{\scriptscriptstyle \mathtt{BR}} &= \ln \bar{c}_t + b_y^{\scriptscriptstyle \mathtt{BR}} \cdot \hat{y}_t + b_r^{\scriptscriptstyle \mathtt{BR}} \cdot \hat{r}_t \end{align*}

where b_r^{\scriptscriptstyle \mathtt{BR}} and b_y^{\scriptscriptstyle \mathtt{BR}} are given by the rule:

(25)   \begin{align*} b_i^{\scriptscriptstyle \mathtt{BR}} &= \begin{cases} b_i^{\scriptscriptstyle \mathtt{Rat}} &\text{ if } | b_i^{\scriptscriptstyle \mathtt{Rat}} | \geq \mathtt{constant} \\ 0 &\text{ else } \end{cases} \end{align*}

A reasonable choice of this constant might be to choose \kappa \cdot \sigma_{\ln c} / \sigma_i for i = r,y as it would have the interpretation that the agent would set b_i^{\scriptscriptstyle \mathtt{BR}} = b_i^{\scriptscriptstyle \mathtt{Rat}} whenever taking the interest rate or endowment level into accound would change the standard deviation of log consumption by \kappa standard deviations and otherwise b_i^{\scriptscriptstyle \mathtt{BR}} = 0. Thus, an sparsity seeking boundedly rational representative agent has a particular rule in mind when figuring out which higher order Taylor terms to ignore.

Notes: Ait-Sahalia and Jacod (2010)

1. Introduction

In this post, I summarize the econometric method introduced in Analyzing the Spectrum of Asset Returns (JEL 2011) by Yacine Ait-Sahalia and Jean Jacod. From an economic perspective, Delbaen and Schachermayer (1994) (Theorem 1.1) tells us that a (log) stock price follows a semi-martingale if and only if there is no arbitrage where a semi-martingale is defined as follows:

Definition (Semi-Martingale):
A real valued process \{ X_t \}_{t \geq 0} defined on the filtered probability space (\Omega, \{\mathcal{F}_t \}_{t \geq 0}, \mu) is called a semi-martingale if it can be decomposed:

(1)   \begin{align*} X_t &= M_t + A_t \end{align*}

where \{M_t\}_{t \geq 0} is a local martingale and \{A_t\}_{t \geq 0} is an adapted process with locally bounded variation.

What’s more, from a purely statistical perspective, we know that a semi-martingale \{ X_t \}_{t \geq 0} can be decomposed into the sum of a drift component, a Brownian component and small and large jump components:

(2)   \begin{align*} X_t &= X_0 + \underbrace{\int_0^t b_s \cdot ds}_{\text{``Drift''}} + \underbrace{\int_0^t \sigma_s \cdot dW_s}_{\text{``Brownian''}} + \underbrace{\int_0^t \int_{|x| \leq \epsilon} x \cdot (\mu - \nu)[ds,dx]}_{\text{\tiny ``Jump''}} + \underbrace{\int_0^t \int_{|x| > \epsilon} x \cdot \mu[ds,dx]}_{\text{\Large ``Jump''}} \end{align*}

In the characterization above, \epsilon > 0 which marks the cutoff between large and small jumps is arbitrary, but must be fixed. In this paper, the authors develop a suite of statistical tools based on spectrographic analysis in order to examine a time series properties of high frequency stock data over the time interval [0,T] and determine whether or not the data has:

  1. A Brownian component,
  2. A jump component, and
  3. (if yes…) A finite number of jumps.

The authors ignore questions about the drift component since it is invisible for all intents and purposes at high frequencies.

2. Rough Idea

The authors’ basic unit of observation is the change in the log stock price X over the interval [0,T] sampled at a frequency of \Delta_n seconds. For, notational convenience, the authors write the change in X from observation (i - 1) to observation i when sampling at a frequency of \Delta_n as:

(3)   \begin{align*} \Delta_i^n X &= X_{i \cdot \Delta_n} - X_{(i - 1) \cdot \Delta_n}  \end{align*}

Since the authors are investigating the volatility and jump behavior of the process X, a natural starting point is to look at the sum of the powers of the incremental changes \Delta_i^n X. For instance, if p=2 then we’d have \mathtt{Var}[\Delta^n X] = (1/ \lfloor T/\Delta_n \rfloor) \cdot \sum_{i=1}^{ \lfloor T/\Delta_n \rfloor} \left| \Delta_i^n X \right|^2.[1] Thus, the authors use the extended concept of variation \mathtt{B}(p,u_n,\Delta_n) defined below to decompose movements in X into smooth Brownian components and large and small jump components:

(4)   \begin{align*} \mathtt{B}(p,u_n,\Delta_n) &= \sum_{i=1}^{\lfloor T/\Delta_n \rfloor} \left| \Delta_i^n X \right|^p \cdot 1_{\{| \Delta_i^n X| \leq u_n \}} \end{align*}

The power variation function has 3 free parameters which can be adjusted by the econometrician:

  1. Power: p. As p \searrow 0 the smaller \Delta_i^n Xs get more and more weight in the summation \mathtt{B}(p,u_n,\Delta_n); whereas, when p \nearrow \infty the larger \Delta_i^n Xs get more and more weight. Thus, for p < 2 the power variation estimator \mathtt{B}(p,u_n,\Delta_n) is tuned to the continuous variation in X, while for p > 2 the estimator is tuned to pick up jumps in X.
  2. Truncation Level: u_n. Truncating large increments will eliminate large jumps from the series. X will always contain a finite number of big jumps but may have an infinite number of small jumps.
  3. Sampling Frequency: \Delta_n. By sampling at slightly different frequencies k \cdot \Delta_n and \Delta_n, we can identify the asymptotic behavior of the power variation as \Delta_n \searrow 0. For instance, if \lim_{\Delta_n \searrow 0} \mathtt{B}(p,u_n,k \cdot \Delta_n)/ \mathtt{B}(p,u_n,k \cdot \Delta_n) < 1, then the power variation will diverge to \infty for a particular power level p and truncation level u_n. Alternatively, if this limit is 1 or >1 the power variation with converge to a finite value or to 0 respectively.

The power parameter controls whether the power variation emphasizes small or large movements in X.

Varying the sampling frequency allows identifies of convergence of the power variation of X at different power and truncation level parameters.

3. Examples

To get a feel for how to use the power variation of a series X to decompose its movements into Brownian and jump components, I now walk through 2 examples:

Example (Brownian Component Exists): Consider the null hypothesis that the Brownian motion component \int_0^t \sigma_s \cdot dW_s exists in the stochastic process X. If we consider powers p < 2, then if the log price process X contains a Brownian component the many tiny incremental changes due to Brownian motion will dominate the power variation \mathtt{B}(p,u_n,\Delta_n) and it will diverge to \infty. Conversely, without a Brownian component, for some cutoff \beta \in (0,2) the power variation of X will converge to 0 at exactly the same rate for k \cdot \Delta_n and \Delta_n for all p > \beta since only the size of the jumps not their frequency will matter. This intuition yields a test statistic \mathtt{S}_{\scriptscriptstyle \exists dW}:

(5)   \begin{align*} \mathtt{S}_{\scriptscriptstyle \exists dW}(p, u_n, k, \Delta_n) &= \frac{\mathtt{B}(p,u_n,\Delta_n)}{\mathtt{B}(p,u_n,k \cdot \Delta_n)} \end{align*}

The asymptotic results for this test as \Delta_n \searrow 0 are:

(6)   \begin{align*} \mathtt{S}_{\scriptscriptstyle \exists dW}(p, u_n, k, \Delta_n) &\overset{\mathtt{P}}{\longrightarrow}  \begin{cases} k^{1 - p/2} &\text{ if Brownian component exists} \\ 1 &\text{ else} \end{cases} \end{align*}

Example (Jump Component Exists): Consider the null hypothesis that at least 1 of the jump components exists in the log price process X. Then, if we set the power parameter p > 2, then as we sample at higher and higher frequencies only the jump components of X should matter for the value of the power variation. This yields a test statistic \mathtt{S}_{\scriptscriptstyle \exists J} defined below with p > 2:

(7)   \begin{align*} \mathtt{S}_{\scriptscriptstyle \exists J}(p,k,\Delta_t) &= \frac{\mathtt{B}(p,\infty, k \cdot \Delta_n)}{\mathtt{B}(p, \infty, \Delta_n)} \end{align*}

The asymptotic results for this test as \Delta_n \searrow 0 are:

(8)   \begin{align*} \mathtt{S}_{\scriptscriptstyle \exists J}(p,k,\Delta_t) &\overset{\mathtt{P}}{\longrightarrow}  \begin{cases} 1 &\text{ if either jump component exists} \\ k^{p/2 - 1} &\text{ else} \end{cases} \end{align*}

  1. Here, the \lfloor \cdot \rfloor operator computes the nearest integer less than or equal to its argument. So for instance, \lfloor 4.123 \rfloor = 4.

Notes: Kristensen and Mele (2011)

1. Introduction

In this post, I work through Adding and Subtracting Black Scholes, JFE (2011) by Antonio Mele and Dennis Kristensen. This paper develops a method for approximating the price of an asset where no closed form expression exists. In the analysis below, I focus my attention on pricing a European option on a stock when the stock price displays stochastic volatility.

In Section 2, I describe the approximation method which hinges on the wedge between the infinitesimal generator of the asset price under the true specification (e.g., with stochastic volatility) and a simpler reference specification (e.g., with constant volatility). Then, in Section 3 I conduct some simple numerical exercises to see how the approximation behaves in practice.

2. The Method

Consider a stock with a price X_t in units of \$‘s at time t in units of \mathtt{yr}‘s whose evolution under the risk neutral measure is described by the equations below:

(1)   \begin{align*} \mathtt{d} X_t &= r \cdot X_t \cdot \mathtt{d}t + V_t^{\scriptscriptstyle \frac{1}{2}} \cdot X_t \cdot \mathtt{d}B_{x,t} \\ \mathtt{d} V_t &= \kappa \cdot \left( \alpha - V_t \right) \cdot \mathtt{d}t + \omega \cdot | V_t |^\xi \cdot \mathtt{d}B_{v,t} \end{align*}

Here, r is the risk free rate in units of 1/\mathtt{yr}, V_t is the instantaneous variance of the stock price in units of 1/\mathtt{yr}, \kappa is the mean reversion rate of the variance of the stock price in units of 1/\mathtt{yr}, \alpha is the mean of the variance of the stock price in units of 1/\mathtt{yr} and both \omega and \xi are constants parameterizing the volatility of the variance of the stock price where \omega has units of \mathtt{yr}^{\xi - {\scriptscriptstyle \frac{3}{2}}} and \xi is unitless.

Variance and price of the underlying stock.

Let \phi(X) = \max\left\{ X - K,0 \right\} be the payout of a European option on this stock with strike price K (in \$). Then, define the price of this option P as follows:

(2)   \begin{align*} P(X,V;t) &= \mathtt{E} \left[ \ \phi(X_T) \ \middle| \ X_t, V_t \  \right] \end{align*}

As shown in Ait-Sahalia and Kimmel (2007), when \xi \neq 1/2 there is no closed form solution for P. The authors approximate P(X,V;t) by computing the difference between the infinitesimal generators of P(X,V;t) and \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) where \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) is the price of a European option on a stock whose volatility is constant and set to \sigma.

Definition (Infinitesimal Generator): Let \{Y_t\} be a time homogeneous Ito diffusion in \mathcal{R}^N. The infinitesimal generator \mathtt{A} of Y_t is defined as:

(3)   \begin{align*} \mathtt{A}[f(y)] &= \lim_{t \searrow 0} \left\{ \frac{\mathtt{E}[f(Y_t)] - f(y)}{t} \right\} \end{align*}

Source: Oksendal (2003)

The infinitesimal generator characterizes how a stochastic process \{Y_t\} would look different if we peered an instant \mathtt{d}t into the future. The theorem below gives a method for computing the infinitesimal generator.

Theorem (Infinitesimal Generator of an Ito Process): Let \{Y_t\} be the Ito diffusion in \mathcal{R}^N:

(4)   \begin{align*} \mathtt{d}Y_t &= m(Y_t) \cdot \mathtt{d}t + s(Y_t) \cdot \mathtt{d}B_t \end{align*}

If f \in \mathcal{C}^2 with a compact support then:

(5)   \begin{align*} \mathtt{A}[f(y)] &= \sum_{n=1}^N m_n(y) \cdot \partial_n[f(y)] + \frac{1}{2} \cdot \sum_{n=1}^N \sum_{n'=1}^N \left( s_n(y) s_{n'}(y)^{\top} \right) \cdot \partial_{n,n'}[f(y)] \end{align*}

Source: Oksendal (2003)

To tie this mathematical construction back into the financial application of options pricing, note that in a world with no financial frictions a la Black and Scholes (1973) the instantaneous change in the price of an asset must exactly equal the interest payment earned by depositing the current price of the asset in a bank for the next instant. Thus, in generator form the second order partial differential equation pinning down the price of a European option on a stock whose price has constant volatility can be written as follows:

(6)   \begin{align*} 0 &= \overset{\mathtt{\scriptscriptstyle BS}}{\mathtt{A}}[\overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t)] - r \cdot \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) \end{align*}

where the generator can be characterized as:

Proposition (Infinitesimal Generator given Constant Variance): The generator for the price of a European option on a stock with constant volatility \sigma is given by:

(7)   \begin{align*} \overset{\mathtt{\scriptscriptstyle BS}}{\mathtt{A}} &= \partial_t + r \cdot X_t \cdot \partial_x + \frac{\sigma^2 \cdot X_t^2}{2} \cdot \partial_{x,x}  \end{align*}

Proof: If a stock price displays constant volatility \sigma, I can write dX_t as follows:

(8)   \begin{align*} \mathtt{d}X_t &= r \cdot X_t \cdot \mathtt{d}t + \sigma \cdot X_t \cdot \mathtt{d}B_{x,t} \end{align*}

where \sigma is the instantaneous volatility of the stock price in units of 1/\sqrt{\mathtt{yr}}. Plugging this differential equation into the formula above yield:

(9)   \begin{align*} m(X_t) &= r \cdot X_t \\ s(X_t) &= \sigma \cdot X_t \end{align*}

I can conduct a similar exercise to characterize the price of a European option on a stock with stochastic volatility. For this asset, I know that:

(10)   \begin{align*} 0 &= \mathtt{A}[P(X,V;t)] - r \cdot P(X,V;t) \end{align*}

What’s more, I can use the same theorem to characterize the infinitesimal generator as:

Proposition (Infinitesimal Generator given Stochastic Variance): The generator for the price of a European option on a stock with stochastic volatility V_t is given by:

(11)   \begin{align*} \mathtt{A} &= \partial_t + r \cdot X_t \cdot \partial_x + \frac{V_t \cdot X_t^2}{2} \cdot \partial_{x,x}  \\ &\qquad \qquad + \kappa \cdot \left( \alpha - V_t \right) \cdot \partial_v + \frac{\omega^2 \cdot V_t^{2 \cdot \xi}}{2} \cdot \partial_{v,v}  \\ &\qquad \qquad \qquad \qquad + \rho \cdot \omega \cdot V_t^{\xi + 1/2} \cdot X_t \cdot \partial_{x,v} \end{align*}

Proof: Applying Ito’s lemma to f(X,V;t) and defining y = (X,V,t) for shorthand yields an expression for \mathtt{d}f(y) as:

(12)   \begin{align*} \mathtt{d}f(y) &= \partial_t [f(y)] \cdot \mathtt{d}t + \partial_x [f(y)] \cdot \mathtt{d}X + \partial_v [f(y)] \cdot \mathtt{d}V  \\ &\qquad \qquad + \frac{1}{2} \cdot \left\{ \ \partial_{x,x} [f(y)] \cdot (\mathtt{d}X)^2 + \partial_{x,v} [f(y)] \cdot (\mathtt{d}X \cdot \mathtt{d}V) + \partial_{v,v}[f(y)] \cdot (\mathtt{d}V)^2 \ \right\} \\ &= \partial_t [f(y)] \cdot \mathtt{d}t + \partial_x [f(y)] \cdot \left\{ r \cdot X \cdot \mathtt{d}t + V^{\scriptscriptstyle \frac{1}{2}} \cdot X \cdot \mathtt{d}B_x \right\} \\ &\qquad + \partial_v [f(y)] \cdot \left\{ \kappa \cdot (\alpha - V) \cdot \mathtt{d}t + \omega \cdot |V|^\xi \cdot \mathtt{d}B_v \right\} \\ &\qquad \qquad + \frac{1}{2} \cdot \left\{ \ \partial_{x,x} [f(y)] \cdot \left( V \cdot X^2 \right) \cdot \mathtt{d}t \right. \\ &\qquad \qquad \qquad \left. + \partial_{x,v} [f(y)] \cdot \left( \rho \cdot \omega \cdot V^{\xi + {\scriptscriptstyle \frac{1}{2}}} \cdot X \right) \cdot \mathtt{d}t \right. \\ &\qquad \qquad \qquad \qquad \left. + \partial _{v,v} [f(y)] \cdot \left( \omega^2 \cdot V^{2 \cdot \xi} \right) \cdot \mathtt{d}t \ \right\} \\ &= \Big\{ \ \partial_t [f(y)] + \partial_x [f(y)] \cdot r \cdot X + \partial_v [f(y)] \cdot \kappa \cdot (\alpha - V) \\ &\qquad \qquad + \frac{1}{2} \cdot \left( \partial_{x,x} [f(y)] \cdot \left( V \cdot X^2 \right) \right. \\ &\qquad \qquad \qquad \left. + \partial_{x,v} [f(y)] \cdot \left( \rho \cdot \omega \cdot V^{\xi + {\scriptscriptstyle \frac{1}{2}}} \cdot X \right) \right. \\ &\qquad \qquad \qquad \qquad \left. + \partial _{v,v} [f(y)] \cdot \left( \omega^2 \cdot V^{2 \cdot \xi} \right) \right) \  \Big\} \cdot \mathtt{d}t \\ &\qquad \qquad + \left\{ \partial_x [f(y)] \cdot V^{\scriptscriptstyle \frac{1}{2}} \cdot X \right\} \cdot \mathtt{d}B_{x,t}  \\ &\qquad \qquad \qquad + \left\{ \partial_v [f(y)] \cdot \omega \cdot |V|^\xi \right\} \cdot \mathtt{d}B_{v,t} \end{align*}

Collecting terms yields expressions for m and s which given \mathtt{A}.

The authors’ key insight is that applying the infinitesimal generator \mathtt{A} to the difference P(X,V;t) - \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) yields the expression:

(13)   \begin{align*} \delta(X,V,\sigma;t) &= \mathtt{A}\left[ P(X,V;t) - \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) \right] - r \cdot \left(P(X,V;t) - \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) \right) \\ &= \frac{(\sigma^2 - V_t) \cdot X_t^2}{2} \cdot \partial_{x,x}[ \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t)] \end{align*}

given the boundary condition that P(X,V;T) - \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;T) = 0 for all V and X. This wedge given by \delta(X,V,\sigma;t) has the natural interpretation of the hedging cost incurred by a trader continuously hedging the risk of the European option with stochastic volatility using the incorrect constant volatility model. Thus, we can write the true price as the Black and Scholes price plus an error correction term:

(14)   \begin{align*} P(X,V;t) &= \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) + \mathtt{E}\left[ \ \int_t^T e^{-r \cdot (u-t)} \cdot \delta(X,V;u) \cdot \mathtt{d}u \ \middle| \ X_t, V_t \ \right] \end{align*}

This is a particularly nice formulation as each of the error terms is a function of only the Black and Scholes (1973) formulation with constant volatility.

3. Numerical Results

To put this approximation method to work, use a Taylor expansion around \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t):

(15)   \begin{align*} P(X,V;t) &= \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) + \sum_{n=0}^\infty \frac{(T-t)^{n+1}}{(n+1)!} \cdot \delta_n(X,V;t) \end{align*}

Each of the expansion terms can be expressed recursively as follows:

(16)   \begin{align*} \delta_n(X,V;t) &= \mathtt{A}[\delta_{n-1}(X,V;t)] - r \cdot \delta_{n-1}(X,V;t) \\ \delta_0(X,V;t) &= \frac{(\sigma^2 - V_t) \cdot X_t^2}{2} \cdot \partial_{x,x}[ \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t)] \end{align*}

The Black and Scholes (1973) options price given constant volatility \sigma can be expressed as:

(17)   \begin{align*} \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) &= N(q_1) \cdot X - N(q_2) \cdot K \cdot e^{-r} \\ q_1 &= \frac{\log \left( \frac{X}{K} \right) + \left( r + \frac{\sigma^2}{2} \right)}{\sigma} \\ q_2 &= q_1 - \sigma \end{align*}

Thus, expanding out the first 2 approximating terms yields:

(18)   \begin{align*} P_0(X,V;t) &= \overset{\mathtt{\scriptscriptstyle BS}}{P}(X;t) + \frac{(\sigma^2 - V_t) \cdot X_t^2}{2} \cdot \left( \frac{N(d_1)}{X \cdot \sigma} \right) \\ P_1(X,V;t) &= P_0(X,V;t) + \frac{1}{2} \cdot \left[ \frac{\kappa \cdot (\alpha - V_t) \cdot X_t^2}{2} \cdot \left( \frac{N(d_1)}{X \cdot \sigma} \right) + \omega \cdot \rho \cdot V_t^{\xi + {\scriptscriptstyle \frac{1}{2}}} \cdot X_t \cdot \left\{ X_t \cdot  \right\} \right] \end{align*}

In order to simplify calculations, I take V_t=0.05 to kill off the P_0(X,V;t) term in my numerical analysis. Below I replicate Figure 3 from the Kristensen and Mele (2011) for N \in \{0,1\} and plot the mispricing as a fraction of the Black and Scholes (1973) price with \xi = 1/2 for N = 0, 1. The strike price is K = 100, the time to maturity is 1 year, the remaining parameter values are \kappa = 2, \alpha = 0.04, \omega = 0.10, \rho = -0.50 and r = 0.10.

Mispricing as a fraction of the Black and Scholes (1973) price with xi = 1/2 for N = 0, 1. The strike price is K = 100, the time to maturity is 1 year, the remaining parameter values are kappa = 2, alpha = 0.04, omega = 0.10, rho = -0.50 and r = 0.10.

The code for the simulations can be found on www.SageMath.org. The simulation represents estimates from 500 iterations at 40 price points between X_0 = 80 and X_0 = 120.

The Buckingham Pi Theorem

1. Introduction

In this post I outline the Buckingham \pi Theorem which shows how to use dimensional analysis to compute answers to seemingly intractable physical problems. For instance, in 1950 Geoffrey Taylor used the theorem to work out the energy payload released by the 1945 Trinity test atomic explosion in New Mexico simply by looking at slow motion video records. My main source for this post is Bluman and Kumei (1989).

Frame from slow motion footage of the Trinity nuclear test with a distance measurement showing the blast radius as well as a time measure showing seconds elapsed since denotation.

2. Basic Framework

The Buckingham \pi Theorem concerns physical problems with the following form: There is a variable of interest, y, which is some unknown function of N different physical quantities x_1, x_2, \ldots, x_N.

(1)   \begin{align*} y &= f(x_1, x_2, \ldots, x_N) \end{align*}

Each of these physical quantities is composed of measurements in only M \leq (N-1) fundamental dimensions labeled c_1, c_2, \ldots, c_M. Thus, I can define a dimension operator which gives the dimensions of an arbitrary variable w \in \{y, x_1, x_2, \ldots, x_N\} and write its output as:

(2)   \begin{align*} \mathtt{dim}[z] &= \prod_{m=1}^{M} c_m^{a_m} \end{align*}

So, for example, if z is measuring pressure on the surface of a table, I could write \mathtt{dim}[z] = \mathtt{lb}/\mathtt{in}^2 where c_1 = \mathtt{lb}, c_2 = \mathtt{in}, a_1 = 1 and a_2 = -2.

Definition (Dimensionless Quantity):
An arbitrary variable w \in \{y, x_1, x_2, \ldots, x_N\} is dimensionless if:

(3)   \begin{align*} \mathtt{dim}[z] &= 1 \end{align*}

3. Main Result

Now, I show how to reformulate this problem and apply linear algebra to the dimensional exponents to derive a characterization of the solution to f(x_1,x_2,\ldots,x_N) - y = 0 as a function of dimensionless quantities. First, I define A as an M \times N matrix of dimensional exponents for x_1,x_2,\ldots,x_N and B as the M \times 1 vector of dimensional exponents for y:

(4)   \begin{align*} A &= \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,N} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,N} \\ \vdots & \vdots & \ddots & \vdots \\ a_{M,1} & a_{M,2} & \cdots & a_{M,N} \end{bmatrix}, \quad B = \begin{bmatrix} b_1 & b_2 & \cdots &  b_M  \end{bmatrix}^{\top} \end{align*}

Next, we know that if A has full rank, then there will N - M = K solutions to the system of equations 0 = AU. Define U as this N \times K matrix of solutions.

(5)   \begin{align*} U &= \begin{bmatrix} u_{1,1} & u_{1,2} & \cdots & u_{1,K} \\ u_{2,1} & u_{2,2} & \cdots & u_{2,K} \\ \vdots & \vdots & \ddots & \vdots \\  u_{N,1} & u_{N,2} & \cdots & u_{N,K} \end{bmatrix} \end{align*}

Finally, define V as the solution to system of equations 0 = AV + B:

(6)   \begin{align*} V &= \begin{bmatrix} v_1 & v_2 & \cdots &  v_N  \end{bmatrix}^{\top} \end{align*}

With these objects in hand, I can now state the Buckingham \pi Theorem.

Proposition (Buckingham \pi):
A physical system y = f(x_1,x_2,\ldots,x_N) with M fundamental dimensions can be restated as:

(7)   \begin{align*} y &= \frac{g(\pi_1,\pi_2,\cdots,\pi_K)}{x_1^{u_1} \cdot x_2^{u_2} \cdots x_N^{u_N}}, \end{align*}

where K = N - M, g(\cdot) is an unknown function and \{\pi_1,\pi_2,\cdots,\pi_K\} are dimensionless parameters constructed from the physical parameters \{x_1,x_2,\ldots,x_N\} using equations of the form below:

(8)   \begin{align*} \pi_k &= x_1^{v_1} \cdot x_2^{v_2} \cdots x_N^{v_N}, \end{align*}

4. An Example

I now give an example of how to employ this theorem by working out the nuclear payload example from the introduction. For more information on this example, take a look at this blog post. Suppose that an atomic blast has a shock wave radius of \delta = f(\epsilon,\tau,\rho,\phi) with variables:

  1. \epsilon: Energery released by the explosion,
  2. \tau: Time elapsed since the explosion took place,
  3. \rho: Initial density, and
  4. \phi: Initial pressure.

For this problem N = 4 and M=3 with fundamental dimensions of length l, mass m, and time t yielding the dimensional matrix A written below:

(9)   \begin{align*} A &= \begin{bmatrix} 2 & 0 & -3 & -1 \\ 1 & 0 & 1 & 1 \\ -2 & 1 & 0 & -2 \end{bmatrix} \end{align*}

The energy (force) released by the bomb is the amount of mass accelerating through a unit square on the surface of the blast wave. Since K = N - M = 1, we have only 1 dimensionless constant which can be computed as the solution to the system of equations 0 = AU:

(10)   \begin{align*} \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} &= \begin{bmatrix} 2 & 0 & -3 & -1 \\ 1 & 0 & 1 & 1 \\ -2 & 1 & 0 & -2 \end{bmatrix} \begin{bmatrix} - \alpha \cdot 2 \\ \alpha \cdot 6 \\ - \alpha \cdot 3 \\ \alpha \cdot 5 \end{bmatrix}  \end{align*}

Setting the scalar free parameter \alpha = 1, I can write \pi_1 as:

(11)   \begin{align*} \pi_1 &= \epsilon^{-2} \cdot \tau^{6} \cdot \rho^{-3} \cdot \phi^{5} \end{align*}

The dimension of the shock wave radius \delta is in units of distance yielding the B dimension matrix below:

(12)   \begin{align*} B &= \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}^{\top} \end{align*}

The system of 3 equations 0 = AV + B has 4 unknowns:

(13)   \begin{align*} 0 &= 1 + 2 \cdot v_1 - 3 \cdot v_3 - v_4 \\ 0 &= v_1 + v_3 + v_4 \\ 0 &= -2 \cdot v_1 + v_2 - 2 \cdot v_4 \end{align*}

Thus, the vector V will thus be defined up to a single free parameter \hat{\alpha} \geq 0:

(14)   \begin{align*} V &= \begin{bmatrix} (8 \cdot \hat{\alpha} - 1)/5 \\ 2 \cdot (3 \cdot \hat{\alpha} - 1) / 5 \\ (1 - 3 \cdot \hat{\alpha}) / 5 \\ \hat{\alpha} \end{bmatrix}^{\top} \end{align*}

Using the formula \pi = \delta/\left( \epsilon^{v_1} \cdot \tau^{v_2} \cdot \rho^{v_3} \cdot \phi^{v_4} \right) and tuning \hat{\alpha} = 0, I can compute \pi as:

(15)   \begin{align*} \pi &= \delta \cdot \left[ \frac{\epsilon \cdot \tau^2}{\rho} \right]^{-1/5}  \end{align*}

Combining all of these results yields a formulation for \delta in terms of a constant (\epsilon \cdot \tau^2/\rho)^{1/5} and an unknown function of the dimensionless quantity g(\pi_1):

(16)   \begin{align*} \delta &= \left[ \frac{\epsilon \cdot \tau^2}{\rho} \right]^{1/5} \cdot g \left( \pi_1 \right) \end{align*}

Taylor expanding around g(0) where g(0) \neq 0 yields a formulation where \delta \propto t^{2/5} with scaling constant c given by:

(17)   \begin{align*} c &= \left( \frac{E}{\rho_0} \right)^{1/5} \cdot g(0) \end{align*}

Setting g(0) = 1, the \log \times \log plot of \delta vs \tau yielded an accurate fit where the predicted values fall on the solid line and the (declassified) empirically observed values are denoted by +‘s in the plot below:

Notes: Glosten and Milgrom (1985)

1. Introduction

In this post, I replicate the main results from Glosten and Milgrom (1985) using the setup outlined in Back and Baruch (2003). I begin in Section 2 by laying out the continuous time asset pricing framework. I consider the behavior of an informed trader who trades a single risky asset with a market maker that is constrained by perfect competition. Then, in Section 3 I solve for the optimal trading strategy of the informed agent as a system of first order conditions and boundary constraints. Finally, I show how to numerically compute comparative statics for this model.

2. Asset Pricing Framework

There is a single risky asset which pays out v \in \{0,1\} at a random date \tau > 0. There is an informed trader and a stream of uninformed traders who arrive with Poisson intensity \beta. All traders have a fixed order size of \delta  = 1. The model end date \tau is distributed exponentially with intensity \kappa.

Let z_{t-} denote the net position of the noise traders up to but not including date t and let x_{t-} denote the net position of the informed up to but not including date t. The market maker sees an anonymous order flow at each time t of dy_t = dx_t + dz_t so that \{y_s \mid s \leq t\} generates a \sigma-field \mathcal{F}_t^y which represents the market maker’s information set.

Perfect competition dictates that the market maker sets the price of the risky asset p_t = \mathbb{E}\left[ \ v \ \middle| \ \mathcal{F}_t^y \ \right]. Let b_t and a_t denote the bid and ask prices at time t.

(1)   \begin{align*} a_t &= \mathbb{E} \left[ \ v \ \middle| \ \mathcal{F}_{t-}^y, \ dy_t = 1 \ \right] \\ b_t &= \mathbb{E} \left[ \ v \ \middle| \ \mathcal{F}_{t-}^y, \ dy_t = -1 \ \right] \end{align*}

Let p_{t-} be the left limit of the price p at time t. Given that v \in \{0,1\}, we can interpret p_{t-} as the probability of the event v=1 at time t given the information set \mathcal{F}_{t-}^y. The informed trader chooses a trading strategy \{dx_t\}_{t \leq \tau} in order to maximize his end of game wealth at random date \tau with 0 discount rate. Let dx_t^+ = \max\{0,dx_t\} and let dx_t^- = \min\{0,dx_t\}.

(2)   \begin{align*} w &= \max_{\{dx_t\}_{t \leq \tau}} \left\{ \mathbb{E} \left[ \ \int_0^\tau \left( v - a_t \right) \cdot dx_t^+ + \int_0^\tau \left( b_t - v \right) \cdot dx_t^- \ \middle| \ v \ \right] \right\} \end{align*}

In order to guarantee a solution to the optimization problem posed above, I restrict the domain of potential trading strategies to those that generate finite end of game wealth.

(3)   \begin{align*} \infty > \max_{\{dx_t\}_{t \leq \tau}} \left\{ \mathbb{E} \left[ \ \int_0^\tau \left( v - a_t \right) \cdot dx_t^+ \ \middle| \ v = 1 \ \right] \right\} \\ \infty > \max_{\{dx_t\}_{t \leq \tau}} \left\{ \mathbb{E} \left[ \ \int_0^\tau \left( b_t - v \right) \cdot dx_t^- \ \middle| \ v = 0 \ \right] \right\} \end{align*}

I then look for probabilistic trading intensities which make the net position of the informed trader a martingale.

Definition: At each time t \leq \tau, an equilibrium consists of a pair of bid and ask prices,

(4)   \begin{align*} \begin{bmatrix} b & a \end{bmatrix}_{s \leq t} \end{align*}

as well as a vector of trading intensities,

(5)   \begin{align*} \Theta &= \begin{bmatrix} \theta_{H,B} & \theta_{H,S} & \theta_{L,B} & \theta_{L,S} \end{bmatrix}_{s \leq t} \end{align*}

such that the prices equal the conditional expectation of the asset value relative to \mathcal{F}_{t-}^y given a sell or buy order and the trading intensities solve the informed agent’s objective function, satisfy the finiteness conditions and are martingales relative to the informed trader’s information set \mathcal{F}_{t-}^y:

(6)   \begin{align*} 0 &= \mathbb{E} \left[ \ x_t^+ - v \cdot \int_0^t \theta_{H,B} \left( p_{s-} \right) \cdot ds - \left( 1 - v \right) \cdot \int_0^t \theta_{L,B} \left( p_{s-}\right) \cdot ds \ \middle| \ \mathcal{F}_{t-}^y \ \right] \\ 0 &= \mathbb{E} \left[ \ x_t^- - v \cdot \int_0^t \theta_{H,S} \left( p_{s-} \right) \cdot ds - \left( 1 - v \right) \cdot \int_0^t \theta_{L,S} \left( p_{s-}\right) \cdot ds \ \middle| \ \mathcal{F}_{t-}^y \ \right] \end{align*}

In the definition above, the \{H,L\} and \{B,S\} subscripts denote the realized value and trade directions for the informed traders. So, for example, \theta_{H,B} denotes the trading intensity at some time t in the buy direction of an informed trader who knows that the value of the asset is v=1.

3. Optimal Trading Strategies

I now characterize the equilibrium trading intensities of the informed traders. First, observe that since \tau is distributed exponentially, the only relevant state variable is p_{t-} at time t. Thus, in the equations below, I drop the time dependence wherever it causes no confusion.

Since the bid and ask prices are conditional expectations, we can compute their values using Bayes’ rule.

(7)   \begin{align*} a(p) &= \left( \frac{ p \cdot \theta_{H,B} ( p ) }{p \cdot \theta_{H,B}(p) + ( 1 - p ) \cdot \theta_{L,B} (p) + \beta} \right) \cdot 1 \\ &\qquad \qquad + \left( \frac{( 1 - p ) \cdot \theta_{L,B}(p)}{p \cdot \theta_{H,B}(p) + ( 1 - p ) \cdot \theta_{L,B} (p) + \beta} \right) \cdot 0 \\ &\qquad \qquad \qquad \qquad + \left( \frac{\beta}{p \cdot \theta_{H,B} \left( p \right) + ( 1 - p ) \cdot \theta_{L,B} (p) + \beta} \right) \cdot p \\ &= \frac{ p \cdot \theta_{H,B} ( p ) + p \cdot \beta}{p \cdot \theta_{H,B} ( p ) + ( 1 - p ) \cdot \theta_{L,B} ( p ) + \beta} \\ b(p) &= \frac{p \cdot \theta_{H,S} ( p ) + p \cdot \beta(p)}{p \cdot \theta_{H,S} ( p ) + ( 1 - p ) \cdot \theta_{L,S} (p ) + \beta} \end{align*}

I now want to derive a set of first order conditions regarding the optimal decisions of high and low type informed agents as functions of these bid and ask prices which can be used to pin down the equilibrium vector of 4 trading intensities. Let w_H(p) and w_L(p) denote the value functions of the high and low type informed traders respectively.

Condition 1: No arbitrage implies that a(p) > p > b(p) for all p \in (0,1) with a(0) = 0 = b(0) and a(1) = 1 = b(1) since:

(8)   \begin{align*} \mathbb{E} \left[ \ v \ \middle| \ \mathcal{F}_{t-}^y, \ dy_t = 1 \ \right] \geq  \mathbb{E} \left[ \ v \ \middle| \ \mathcal{F}_{t-}^y \  \right] \geq  \mathbb{E} \left[ \ v \ \middle| \ \mathcal{F}_{t-}^y, \ dy_t = -1 \ \right] \end{align*}

Thus, for all p \in (0,1) it must be that \theta_{H,B}(p) > \theta_{L,B}(p) > 0 and \theta_{L,S}(p) > \theta_{H,S}(p) > 0. What’s more, p=1 and p=0 are absorbing points for p meaning that w_L(0) = w_H(1) = 0 while w_L(1) = w_H(0) = \infty.

Condition 2: The around a buy or sell order, the price moves by jumping from p \nearrow a(p) or from p \searrow b(p) so we can think about the stochastic process dp as composed of a deterministic drift component \mu(p) and 2 jump components with magnitudes \{a(p) - p\} and \{b(p) - p\}.

(9)   \begin{align*} \mathbb{E}\left[ dp \right] &= \mu(p) \cdot dt + \lambda_a \cdot \left\{ a(p) - p \right\} \cdot dt + \lambda_b \cdot \left\{ b(p) - p \right\} \cdot dt \end{align*}

The probabilities \lambda_a and \lambda_b can be computed using Bayes’ rule.

(10)   \begin{align*} \lambda_a &= p \cdot \theta_{H,B}(p) + (1 - p) \cdot \theta_{L,B}(p) + \beta \\ \lambda_b &= p \cdot \theta_{H,S}(p) + (1 - p) \cdot \theta_{L,S}(p) + \beta \end{align*}

Substituting in the formulas for a(p) and b(p) from above yields an expression for the price change that is purely in terms of the trading intensities and the price.

(11)   \begin{align*} \mathbb{E}\left[dp\right] &= \mu(p) \cdot dt + p \cdot \left( 1 - p \right) \cdot \left( \theta_{H,B}(p) + \theta_{H,S}(p) - \theta_{L,B}(p) - \theta_{L,S}(p) \right) \cdot dt \end{align*}

However, via the conditional expectation price setting rule, dp must be a martingale meaning that \mathbb{E}[dp] = 0.

(12)   \begin{align*} \mu(p) &= p \cdot \left( 1 - p \right) \cdot \left( \theta_{L,B}(p) + \theta_{L,S}(p) - \theta_{H,B}(p) - \theta_{H,S}(p) \right) \end{align*}

Condition 3: At the time of a buy or sell order, smooth pasting implies that the informed trader was indifferent between placing the order or not. For instance, if he strictly preferred to place the order, he would have done so earlier via the continuity of the price process.

(13)   \begin{align*} w_H(p) &= \left(  1 - a(p) \right) + w_H( a(p) ) \\ w_L(p) &= b(p) + w_L ( b(p) ) \end{align*}

Condition 4: It is not optimal for the informed traders to bluff. i.e., a high type informed trader can never increase his value function by selling at time t and vice versa for a low type informed trader.

(14)   \begin{align*} w_H(p) &\geq \left(  b(p) - 1 \right) + w_H ( b(p) ) \\ w_L(p) &= -a(p) + w_L ( a(p) ) \end{align*}

Condition 5: In all time periods in which the informed trader does not trade, smooth pasting implies that he must be indifferent between trading and delaying an instant dt. There are 2 forces at work here. In each instant dt, there is a \kappa probability that \tau will arrive and the informed trader’s value function will plummet to 0. This cost has to be offset by the value delaying. For the high type informed trader, this value includes the value change due to the price drift (dw_H/dp) \cdot \mu(p), the value change due to an uninformed trader placing a buy order with probability \beta and the value change due to an uninformed trader placing a sell order with probability \beta. Similar reasoning yields a symmetric condition for low type informed traders.

(15)   \begin{align*} \kappa \cdot w_H(p) &= w_H'(p) \cdot \mu(p) + \beta \cdot \left( w_H( a(p)) - w_H(p) \right) + \beta \cdot \left( w_H( b(p) ) - w_H(p) \right) \\ \kappa \cdot w_L(p) &= w_L'(p) \cdot \mu(p) + \beta \cdot \left( w_L( a(p) ) - w_L(p) \right) + \beta \cdot \left( w_L( b(p) ) - w_L(p) \right) \end{align*}

This combination of 5 conditions pins down the equilibrium.

Proposition: If the trading strategies are admissible, w_H is a non-increasing function of p, w_L is a non-decreasing function of p, both value functions satisfy the 5 conditions above, and the trading strategies are continuously differentiable on the interval (0,1), then the trading strategies are optimal for all t.

In the section below, I solve for the equilibrium trading intensities and prices numerically.

4. Numerical Solution

In the results below, I set \beta = 1/2 and \kappa = 1 for simplicity. I compute the value functions w_H and w_L as well as the optimal trading strategies \Theta on a grid over the unit interval with N nodes. Let \mathbf{P} denote the vector of N prices.

(16)   \begin{align*} \mathbf{P} = \begin{bmatrix} p_1 & p_2 & \cdots & p_N \end{bmatrix} \end{align*}

Let \mathbf{W}_H(\mathbf{P};\mathtt{i}) and \mathbf{W}_L(\mathbf{P};\mathtt{i}) denote the vector of value function levels over each point in the price grid \mathbf{P} after iteration \mathtt{i}. I use the teletype style i to denote the number of iterations in the optimization algorithm. w_H(p_n;\mathtt{i}) denotes the level of the value function at price point p_n after \mathtt{i} iterations.

(17)   \begin{align*} \mathbf{W}_H(\mathbf{P};\mathtt{i}) &= \begin{bmatrix} w_H(p_1;\mathtt{i}) & w_H(p_2;\mathtt{i}) & \cdots & w_H(p_N;\mathtt{i}) \end{bmatrix} \\ \mathbf{W}_L(\mathbf{P};\mathtt{i}) &= \begin{bmatrix} w_L(p_1;\mathtt{i}) & w_L(p_2;\mathtt{i}) & \cdots & w_L(p_N;\mathtt{i}) \end{bmatrix} \end{align*}

The algorithm below computes w_H(p), w_L(p), a(p) and b(p). The equilibrium trading intensities can be derived from these values analytically. I seed initial guesses at the values of \mathbf{W}_H(\mathbf{P};\mathtt{0}) and \mathbf{W}_L(\mathbf{P};\mathtt{0}).

(18)   \begin{align*} w_H(p_n;\mathtt{0}) &= e^{10 \cdot (1 - p_n)} - 1 \\ w_L(p_n;\mathtt{0}) &= e^{10 \cdot p_n} - 1 \end{align*}

Then, I iterate on these value function guesses until the adjustment error \Gamma(\mathtt{i}) which I define in Step 5 below is sufficiently small. The estimation strategy uses the fixed point problem in Equation (13) to compute a(p) and b(p) given w_H(p) and w_L(p) and then separately uses the martingale condition in Equation (9) to compute the drift in the price level. The algorithm updates the value function in each step by first computing how badly the no trade indifference condition in Equation (15) is violated, and then lowering the values of w_H(p) for p near 1 when the high type informed trader is too eager to trade and raising them when he is too apathetic about trading and vice versa for the low type trader. Along the way, the algorithm checks that neither informed trader type has an incentive to bluff.

x-axis: Price of risky asset. Panel (a): Value function for the high (red) and low (blue) type informed trader. Panel (b): Bid (red) and ask (blue) prices for the risky asset. Panel (c): Between trade price drift.

Below I outline the estimation procedure in complete detail. Code the for the simulation can be found on my GitHub site.

\mathtt{while} (\Gamma(\mathtt{i}) > \mathtt{tol}) \ \{

Step 1. Numerically compute w_H'(p_n;\mathtt{i}) and w_L'(p_n;\mathtt{i}) at each point.

(19)   \begin{align*} w_H'(p_n;\mathtt{i}) &= \frac{1}{2} \cdot \left( \frac{w_H(p_{n+1};\mathtt{i}) - w_H(p_n;\mathtt{i})}{p_{n+1} - p_n} + \frac{w_H(p_n;\mathtt{i}) - w_H(p_{n-1};\mathtt{i})}{p_n - p_{n-1}} \right) \\ w_L'(p_n;\mathtt{i}) &= \frac{1}{2} \cdot \left( \frac{w_L(p_{n+1};\mathtt{i}) - w_L(p_n;\mathtt{i})}{p_{n+1} - p_n} + \frac{w_L(p_n;\mathtt{i}) - w_L(p_{n-1};\mathtt{i})}{p_n - p_{n-1}} \right) \end{align*}

I fill in each of the boundary derivatives manually.

(20)   \begin{align*} w_H'(0;\mathtt{i}) &= w_L'(1;\mathtt{i}) = \infty \\ w_H'(1;\mathtt{i}) &= w_L'(0;\mathtt{i}) = 0 \end{align*}

Step 2. Solve for bid and ask prices using Equation (13).

(21)   \begin{align*} a(p_n;\mathtt{i}) &= \arg_a \left\{ w_H(p_n;\mathtt{i}) = \left(1 - a\right) - w_H(a;\mathtt{i})  \right\} \\ b(p_n;\mathtt{i}) &= \arg_b \left\{ w_L(p_n;\mathtt{i}) = b - w_L(b;\mathtt{i})  \right\} \end{align*}

I interpolate the value function levels at w_H(a;\mathtt{i}) and w_L(b;\mathtt{i}) linearly. Let p_n be the closest price level to a such that a > p_n and let p_m be the closest price level to b such that p_m > b.

(22)   \begin{align*} w_H(a;\mathtt{i}) &= w_H(p_n;\mathtt{i}) + w_H'(p_n;\mathtt{i}) \cdot \left\{ a - p_n \right\} \\ w_L(a;\mathtt{i}) &= w_L(p_m;\mathtt{i}) + w_L'(p_m;\mathtt{i}) \cdot \left\{ p_m - b \right\} \end{align*}

Step 3. Compute \mu(p_n;\mathtt{i}) using Equation (9).

(23)   \begin{align*} 0 &= \mu(p_n;\mathtt{i}) + \lambda_a(p_n;\mathtt{i}) \cdot \left\{ a(p_n;\mathtt{i}) - p_n \right\} + \lambda_b(p_n;\mathtt{i}) \cdot \left\{ b(p_n;\mathtt{i}) - p_n \right\} \end{align*}

I then plug in Equation (10) to compute lambda_a(p_n;\mathtt{i}) and \lambda_b(p_n;\mathtt{i}).

(24)   \begin{align*} \lambda_a(p_n;\mathtt{i}) &= p_n \cdot \theta_{H,B}(p_n;\mathtt{i}) + \beta \\ \lambda_b(p_n;\mathtt{i}) &= (1 - p_n) \cdot \theta_{L,S}(p_n;\mathtt{i}) + \beta \end{align*}

I then use Equation (7) to solve for \theta_{H,B}(p_n;\mathtt{i}) and \theta_{L,S}(p_n;\mathtt{i}) in terms of only prices.

(25)   \begin{align*} \theta_{H,B} ( p_n;\mathtt{i})  &= \frac{\beta \cdot \left\{ p_n - a(p_n;\mathtt{i}) \right\}}{p_n \cdot \left( a(p_n;\mathtt{i}) - 1 \right)} \\ \theta_{L,S} (p_n;\mathtt{i}) &= \frac{\beta \cdot \left\{ p_n - b(p_n;\mathtt{i}) \right\}}{( 1 - p_n ) \cdot b(p_n;\mathtt{i})} \end{align*}

Combining these equations leaves a formulation for \mu(p_n;\mathtt{i}) which contains only prices.

(26)   \begin{align*} \mu(p_n;\mathtt{i}) &= \left( (1 - p_n) \cdot \theta_{L,S}(p_n;\mathtt{i}) + \beta \right) \cdot \left\{ p_n - b(p_n;\mathtt{i}) \right\} \\ &\qquad \qquad - \left( p_n \cdot \theta_{H,B}(p_n;\mathtt{i}) + \beta \right) \cdot \left\{ a(p_n;\mathtt{i}) - p_n \right\} \\ &= \left( \frac{\beta \cdot \left\{ p_n - b(p_n;\mathtt{i}) \right\}}{b(p_n;\mathtt{i})} + \beta \right) \cdot \left\{ p_n - b(p_n;\mathtt{i}) \right\} \\ &\qquad \qquad - \left(\frac{\beta \cdot \left\{ p_n - a(p_n;\mathtt{i}) \right\}}{a(p_n;\mathtt{i}) - 1} + \beta \right) \cdot \left\{ a(p_n;\mathtt{i}) - p_n \right\} \\ &= \frac{\beta \cdot p_n \cdot \left\{ p_n - b(p_n;\mathtt{i}) \right\}}{b(p_n;\mathtt{i})} - \frac{\beta \cdot \left( 1 - p_n \right) \cdot \left\{ a(p_n;\mathtt{i}) - p_n \right\}}{1 - a(p_n;\mathtt{i})} \end{align*}

Step 4. At each p_n for n \in N, set \alpha = 0.10 and ensure that Equation (14) is satisfied. If the high type informed traders want to sell at price p_n, increase their value function at price p_n by \alpha = 10\%.

(27)   \begin{align*} &\mathtt{if} \Big[ w_H(p_n;\mathtt{i}) < \left(  b(p_n;\mathtt{i}) - 1 \right) + w_H \left( b(p_n;\mathtt{i});\mathtt{i}\right) \Big] \ \{ \\ &\qquad \qquad \mu(p_n;\mathtt{i}) = \left( 1 + \alpha \right) \cdot \mu(p_n;\mathtt{i}) \\ &\} \end{align*}

If the low type informed traders want to buy at price p_n, decrease their value function at price p_n by \alpha = 10\%.

(28)   \begin{align*} &\mathtt{if} \Big[ w_L(p_n;\mathtt{i}) < -a(p_n;\mathtt{i}) + w_L \left( a(p_n;\mathtt{i});\mathtt{i} \right) \Big] \ \{ \\ &\qquad \qquad \mu(p_n;\mathtt{i}) = \left( 1 - \alpha \right) \cdot \mu(p_n;\mathtt{i}) \\ &\} \end{align*}

Step 5. Update w_H(p_n;\mathtt{i}) and w_L(p_n;\mathtt{i}) by adding \varsigma = 5\% times the between trade indifference error from Equation (15).

(29)   \begin{align*} w_H(p_n;\mathtt{i+1}) &= w_H(p_n;\mathtt{i}) + \varsigma \cdot \left\{w_H'(p_n;\mathtt{i}) \cdot \mu(p_n;\mathtt{i})  - \kappa \cdot w_H(p_n;\mathtt{i}) \right\} \\ &\qquad \qquad + \varsigma \cdot \beta \cdot \left\{ w_H\left( a(p_n;\mathtt{i});\mathtt{i} \right) + w_H\left( b(p_n;\mathtt{i});\mathtt{i} \right) - 2 \cdot w_H(p_n;\mathtt{i}) \right\} \\ w_L(p_n;\mathtt{i+1}) &= w_L(p_n;\mathtt{i}) + \varsigma \cdot \left\{w_L'(p_n;\mathtt{i}) \cdot \mu(p_n;\mathtt{i})  - \kappa \cdot w_L(p_n;\mathtt{i}) \right\} \\ &\qquad \qquad + \varsigma \cdot \beta \cdot \left\{ w_L\left( a(p_n;\mathtt{i});\mathtt{i} \right) + w_L\left( b(p_n;\mathtt{i});\mathtt{i} \right) - 2 \cdot w_L(p_n;\mathtt{i}) \right\} \end{align*}

Step 6. Evaluate update error.

(30)   \begin{align*} \Gamma(\mathtt{i}) &= \sqrt{ \frac{1}{N} \cdot \left\{ \sum_{n=1}^N \left(w_L(p_n;\mathtt{i+1}) - w_L(p_n;\mathtt{i}) \right)^2 + \sum_{n=1}^N \left(w_H(p_n;\mathtt{i+1}) - w_H(p_n;\mathtt{i}) \right)^2 \right\} } \end{align*}

\}

Protected: Notes: Levy (2010)

This post is password protected. To view it please enter your password below:


CRSP Data Summary Statistics by Industry

1. Introduction

In this post, I compute industry level summary statistics the CRSP monthly file using 2 different industry classification schemes:

  1. Fama and French (1988)
  2. Moskowitz and Grinblatt (1999)

All of the code for the results below as well as a JSON file containing the industry classification schemes can be found at my GitHub page. I use the Zoom.it API to make it convenient to scroll around and inspect the large summary statistic plots I create. Each of these plots can be expanded to full screen mode using the controls at the lower right hand corner of the figure.

2. Data

In this section, I describe my data sources for the plots below.

CRSP Monthly File

I gather my stock data from the CRSP monthly file via the WRDS database. Thus, the unit of observation is a firm \times month pair. I restrict my attention to the time period from January 1988 to December 2010 to focus on the period of time over which the Fama and French (1988) industry classification scheme would have been widely known. I keep only actively traded firms listed on the NYSE, NASDAQ and AMEX exchanges. I require that the firm reports a non-missing price, return, share count and SIC code for a given month. I also remove any observations which lack valid data in the previous month. This leaves me with 1,916,707 total firm \times month observations covering 20,686 firms. The figure below plots the total number of firms in the dataset each month.

Number of firms in the monthly CRSP database from January 1988 to December 2010.

Industry Classifications

I created a JSON file to house CRSP-COMPUSTAT industry classification data. The data can be found in various places throughout the web; e.g., see Ken French’s website used in Fama and French (1988). However, everywhere I looked, the data came as a txt file with quirky formatting. For example, below is the first industry coding from the file on Ken French’s site:

1
2
3
4
5
6
7
 1 Agric  Agriculture
          0100-0199 Agric production - crops
          0200-0299 Agric production - livestock
          0700-0799 Agricultural services
          0910-0919 Commercial fishing
          2048-2048 Prepared feeds for animals
 ...

This format is particularly difficult to read as it is irregularly spaced and little mark-up around the data. In response to this problem, I used Emacs Regexp to convert the file on Ken French’s website into a JSON format. I also coded up the 20 firm industry classification used by Moskowitz and Grinblatt (1999). The JSON file contains 2 major directories, one for the Fama and French (1988) industry classification scheme using 49 different clusters and one for the Moskowitz and Grinblatt (1999) scheme with 20 different clusters. The industry groupings are based on the SIC codes. Below I post a sample entry for the \mathtt{Agriculture} industry from the Fama and French (1988) scheme:

1
2
3
4
5
6
7
8
9
10
{"Fama and French (1988)": {
    "Agriculture": {
	"Agric production - crops": {"start":100, "end":199},
	"Agric production - livestock": {"start":200, "end":299},
	"Agricultural services": {"start":700, "end":799},
	"Commercial fishing": {"start":910, "end":919},
	"Prepared feeds for animals": {"start":2048, "end":2048}
    },
    ...
}

Note that under the main heading there are several subindustry headings. The \mathtt{start} and \mathtt{end} tags denote the initial and ending SIC codes for each subindustry. The Moskowitz and Grinblatt (1999) scheme is less complex. There is a simple start and stop date for each of the 20 broad industry groupings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 "Moskowitz and Grinblatt (1999)": {
     "Mining": {"start":1000, "end":1499},
     "Food": {"start":2000, "end":2099},
     "Apparel": {"start":2200, "end":2399},
     "Paper": {"start":2600, "end":2699},
     "Chemical": {"start":2800, "end":2899},
     "Petroleum": {"start":2900, "end":2999},
     "Construction": {"start":3200, "end":3299},
     "Prim. Metals": {"start":3300, "end":3399},
     "Fab. Metals": {"start":3400, "end":3499},
     "Machinery": {"start":3500, "end":3599},
     "Electrical Eq.": {"start":3600, "end":3699},
     "Transportation Eq.": {"start":3700, "end":3799},
     "Manufacturing": {"start":3800, "end":3999},
     "Railroads": {"start":4000, "end":4099},
     "Other Transport.": {"start":4100, "end":4799},
     "Utilities": {"start":4900, "end":4999},
     "Retail": {"start":5000, "end":5299},
     "Dept. Stores": {"start":5300, "end":5399},
     "Retail": {"start":5400, "end":5999},
     "Financial": {"start":6000, "end":6999}
 }

3. Fama and French (1988) Classification

In this section, I plot 4 different summary plots of the CRSP data split by the Fama and French (1988) industry classification. First, I plot the number of firms in each industry. In all of the plots, I omit the “Other” industry containing firms with no clear industry classification. All of the 48 industries except for Candy and Soda, Coal, Non-Metalic and Industrial Mining, Pharmaceutical Products, Precious Metals and Trading display a single peaked pattern indicating that the number of firms in each industry dramatically expanded around 2000.

Number of firms in the monthly CRSP database from January 1988 to December 2010 by Fama and French (1988) industry classification system.

Next, I break down this firm count by industry plot even further into sub-industries in the figure below. This plot reveals that there is wide variation in the number of subindustries. What’s more, this single peaked pattern does not persist as strongly at the sub-industry level.

Number of firms in the monthly CRSP database from January 1988 to December 2010 by Fama and French (1988) industry classification system split by sub-industry.

I then turn to market capitalization by industry rather than firm counts. In the figure below, we see that while the number of firms in most industries has been shrinking since 2000, each industry’s market capitalization has been rising steadily. Thus, the combination of the first figure with the figure below reveals that industries have been consolidating.

Market capitalization in the monthly CRSP database from January 1988 to December 2010 by Fama and French (1988) industry classification system.

Finally, I look at the distribution of monthly excess returns defined as r_{a,t} - r_{f,t} where r_{f,t} is the 3 month T-Bill by industry. Due to space constraints, it was not possible to plot 1 box plot for each month of observations, so instead I first computed the mean monthly excess return for each firm in each year and then computed yearly box plots. Thus, a data point in the plot below is a mean monthly return for a particular firm over the whole year. This figure reveals that there are large outliers in the return distribution that need to be addressed before any further data work can be done. For instance, in 1992 a firm in the entertainment industry earned an average monthly excess return of over 800\%.

Mean return by year for firms in the monthly CRSP database from January 1988 to December 2010 by Fama and French (1988) industry classification system.

4. Moskowitz and Grinblatt (1999) Classification

I also create similar plots for the industry classification system used in Moskowitz and Grinblatt (1999) which contains only 20 industries rather than the 48 in the Fama and French (1988) system. These charts generally mirror the insights from above—just at a much more granular level. First, I plot the number of firms in each of the 20 industries. These industry grouping were chosen in part to balance out the partition of firms across industries and, as a results, display a much more even cross-sectional distribution.

Number of firms in the monthly CRSP database from January 1988 to December 2010 by Moskowitz and Grinblatt (1999) industry classification system.

Again, the market capitalization plot reveals that firms have been consolidating within each industry since 2000.

Market capitalization in the monthly CRSP database from January 1988 to December 2010 by Moskowitz and Grinblatt (1999) industry classification system.

Finally, I plot the distribution of excess returns for each firm within industry as above.

Mean return by year for firms in the monthly CRSP database from January 1988 to December 2010 by Moskowitz and Grinblatt (1999) industry classification system.

Notes: Vazquez (2011)

1. Introduction

In this note, I outline the main results in Scale Invariance, Bounded Rationality and Non-Equilibrium Economics (WP, 2009) by Sam Vazquez for use in a 5min presentation in Prof. Sargent‘s reading group.

This paper presents an agent based model (i.e., there are a finite number of agents) which makes 2 basic assumptions:

  1. Agents have a scale invariant utility function–i.e., consuming 1 gallon of milk is as satisfying as consuming 3.8 litres of milk.
  2. Agents are boundedly rational and have beliefs about the exchange rates between a limited number of assets.

From these 2 assumption, the author defines an ensemble of economic models using a variety of tools from physics (in particular, symmetry analysis, coarse graining and operator methods). This paper presents a new set of tools to address well known economic problems and illustrates the mathematical symmetry across the class of economic operators commonly employed by economic analysts.

2. An Analogy

To motivate this analysis for people with an economics background, I begin this note with an illustrative analogy. Physicists are used to modeling complex phenomena, e.g., think about the task of computing the pressure a gas such as oxygen exerts against the walls of a box such as a classroom. There are far too many individual oxygen atoms to count and keep track of at a micro level; however, scientists can compute the macro level properties of the gas such as the average pressure exerted on its chamber walls by subscribing to 2 main modeling tricks.

First, they look for a key symmetry of the problem. This is a bit more of an art than a science. In the thermodynamics example above, this symmetric would be the fact that a particular atom of oxygen ought to behave in the exact same way regardless of where it is in the room. For instance, there is no such thing as a “near-the-floor” or “by-the-window” oxygen atom. This symmetry puts restrictions on the functional form of the equations we can use to model the movements and interactions of oxygen atoms and reduces the number of potential state variables.

Second, they look for an appropriate reference neighborhood within which to study the movement of oxygen atoms. For example, we know that the behavior of oxygen atoms in 1 corner of the room are going to have a negligible impact on the behavior the oxygen atoms at the far corner of the room. Thus, we can study the local behavior of the atoms taking the boundaries of its neighborhood as given, and then integrate up across all neighborhoods. This procedure is called coarse graining and will affect the scope of the approximations rather than the functional form of the equations.

Now, let’s consider how to apply these principles to a financial model and follow the lead of Vazquez (2009). First, consider the problem of finding a symmetry. Vazquez chooses a scale symmetry whereby the unit of measure should not affect the utility of an agent. This assumption will put a functional form restriction on the space of viable utility functions. Next, consider the problem of coarse graining. In standard economic models, agents see and trade all assets. By analogy, this would be equivalent to directly linking all oxygen atoms in a room regardless of their distance. To break these connections and allow for coarse graining, Vazquez assumes that agents only carry around an information set containing a subset of all asset pairs which he calls a “what-by-what” matrix. Thus, these 2 key assumptions allow Vazquez to use the statistical mechanics of fields to characterize the behavior of economic agents.

3. Economic Model

In this section, I tackle the basic modeling framework. Time is discrete and moves in integer steps.

3.1 Assets

There are \bar{a} different kinds of assets labeled by a = 1, 2, \ldots, \bar{a} with \mathcal{A} denoting the set of all assets. Agents get (possibly time dependent) utility from holding different combinations of the \bar{a} assets; however, this utility generating mechanism is left unspecified.

3.2 Agents

There are a finite number of \bar{n} agents indexed by n= 1, 2, \ldots, \bar{n} with the set of all agents denoted by \mathcal{N}. Every agent has an inventory of products in the quantity x_{n,a} for asset a with X_n denoting the vector of holdings for agent n. The state space of the economy at any point in time is given by the matrix \mathbf{X} = \left\{ X_n \mid n \in \mathcal{N} \right\}. Let \mathcal{X} denote the set of all possible states with \mathcal{X} = \mathbb{R}_+^{\bar{a} \times \bar{n}}.

Let \zeta_a \in \mathbb{R}_+ denote an asset specific positive scalar constant. Then, by scale invariance I mean that the economy should be unchanged if for every agent n \in \mathcal{N}, we multiply the agent’s holdings of asset a by \zeta_a:

(1)   \begin{align*} x_{n,a} \mapsto \zeta_a \cdot x_{n,a} \end{align*}

This is tantamount to saying that, if we counted all lengths in centimeters instead of meters so that \zeta_a = 100, no actual real outcomes should be changed. This restriction will imply that all essential functions will be homogeneous of degree 1.

3.3 Information

Agents have (possibly different) beliefs about how future of the economy will play out; i.e., about how \mathcal{X} will evolve. Let \mathcal{I}_n denote the beliefs of agent n. Each agent’s information may be biased, narrow or wrong. At each point in time, agents have in their mind an exchange rate matrix \mathbf{M}_n \in \mathcal{I}_n with entries denoted by m_{n,a:b} which denotes the number of units of good a that agent n would accept in exchange for a unit of good b. Let \mathcal{A}_n denote the set of assets for which agent n has an entry in her \mathbf{M}_n matrix. Thus, \mathcal{I}_n is a \sigma-algebra over matrices.

Each agent’s \mathbf{M}_n matrix has the following 3 properties:

  1. Reciprocality: Each agent is willing to buy and sell at the same price. i.e., there is no bid ask spread.

    (2)   \begin{align*} m_{n,a:b} &= \frac{1}{m_{n,b:a}} \end{align*}

  2. Transitivity: There are no profit generating trade combinations. i.e., there are no arbitrage opportunities.

    (3)   \begin{align*} m_{n,a:b} &= m_{n,c:b} \cdot m_{n,a:c} \end{align*}

  3. Scale Symmetry: Adjusting prices by the ratio \zeta_a/\zeta_b of scalar constants used to renormalize the asset units leaves the equilibrium allocations unchanged.

    (4)   \begin{align*} m_{n,a:b} &\mapsto \left( \frac{\zeta_a}{\zeta_b} \right) \cdot m_{n,a:b} \end{align*}

The \mathbf{M}_n matrices capture the idea that each agent’s field of vision or attention is bounded.

3.4 Preferences

Given his information set \mathcal{I}_n, each agent behaves rationally in accordance with von Neumann and Morgenstern axioms of decision theory[1] yielding an index of satisfaction V_n as defined below:

(5)   \begin{align*} V_n &: \mathcal{X}_n \times \mathcal{I}_n \mapsto \mathbb{R} \end{align*}

where V_n = \mathbb{E} \left[ U_n \mid \mathcal{I}_n \right] with U_n as agent n‘s utility function. V_n has the properties that for each agent n \in \mathcal{N}, the partial derivatives are \partial_a V_n > 0 and \partial_a^2 V_n < 0 for each asset a \in \mathcal{A} where \partial_a \equiv \partial / \partial x_{n,a} for brevity. Also, suppose that between period t and t+1, an agent n changes his asset holdings from X_n to X_n', then I will abbreviate the corresponding change in happiness as:

(6)   \begin{align*} \Delta V_n &= V_n \left( X_n' \right) - V_n \left( X_n \right) \end{align*}

The fact that the economy must be scale invariant implies additional restrictions on the utility function of each agent. In particular, it must be the case that the utility function can vary at most by a constant due to a change in scale. i.e., we have that the utility function must be CRRA/log-like in nature such as:

(7)   \begin{align*} U_n \left( X_n \right) &= \phi \cdot \ln \left[ \Psi_n^{\top} X_n \right] \end{align*}

where \Psi_n is a \bar{a} \times 1 vector of free parameters. For instance, consider the utility specification below which equates agent n‘s utility with the value of his asset holdings in terms of a numeraire good denoted by b=1:

(8)   \begin{align*} U_n &= \ln \left[ \sum_{a \in \mathcal{A}} \left( x_{n,a} \cdot m_{n,a:1} \right)^{\phi} \right] \end{align*}

Here, note that scaling up the numeraire on 1 of the assets will have no affect on the first order condition. However, a key assumption in this class of models is that utility is myopic and depends only on the current period’s asset holding which fits nicely with Vazquez’s log utility assumption.

3.5 Economic Operators

Economic movements are classified as operators which are mappings which change agents’ portfolio holdings. For instance, if \mathbb{H} is an arbitrary economic operator, then we have that:

(9)   \begin{align*} \mathbb{H} &: \mathcal{X} \mapsto \mathcal{X} \end{align*}

For example, think about the following examples:

  1. A consumption operator \mathbb{C} which removes asset holdings but increases utility,
  2. A production operator \mathbb{Y} which recombines asset holdings at a net surplus,
  3. A trader operator \mathbb{T} which exchanges asset holdings between agents, or…
  4. A depreciation operator \mathbb{D} which removes asset holdings but does not compensate agents with a utility boost.

Using this general framework, we can talk then about similarities and differences across economic operators. For instance, the trade operator \mathbb{T} preserves the aggregate asset proportions as it simply transfers assets between 2 agents. Thus, we can think about trade as a conservative operator. Alternatively, the consumption, production and depreciation operators are not conservative with respect to the aggregate asset proportions.

In the analysis below, I overload the \Delta terminology used to denote changes in satisfaction due to changes in holdings to be operator specific. In particular, for an arbitrary economic operator \mathbb{H} I define \Delta_{\mathbb{H}} as:

(10)   \begin{align*} \Delta_{\mathbb{H}} V_n &= \mathbb{E} \left[ U \left( \mathbb{H} X_n \right)  - U \left( X_n \right) \mid \mathcal{I}_{\alpha} \right] \end{align*}

4. Examples

In Section 2 above, I defined the basic elements of a class of economic models and then stopped just short of giving an equilibrium definition. In this section, I now look at 2 examples of equilibria from this class of models. Each of these examples will essentially represent a different take on what the equilibrium price object will look like. I do not consider any models with production, consumption or depreciation decisions.

4.1 Fixed Exchange Rate

First consider a world with a single exchange rate for each asset wich is set by fiat. In this world, we have an equilibrium definition:

Definition (Equilibrium): An equilibrium is a \bar{n} \times \bar{a} matrix of allocations \mathbf{X} as well as a \bar{a} \times \bar{a} symmetric matrix of exchange rates \mathbf{M} with a unit diagonal representing \bar{a} \cdot (\bar{a} - 1)/2 unique elements such that given the exchange rate matrix \mathbf{M}, we have that:

  1. For each agent n \in \mathcal{N}, the allocation X_n satisfies:

    (11)   \begin{align*} 0 &= \frac{\partial \left( \Delta_{\mathbb{T}} V_n \right)}{\partial \left( \Delta x_{n,a}^* \right)} \end{align*}

  2. Markets clear such that for each a \in \mathcal{A}:

    (12)   \begin{align*} 0 &= \sum_{n \in \mathcal{N}} \Delta x_{n,a} \end{align*}

In such an economy, we can characterize the equilibrium allocations as follows:

Proposition (Equilibrium w/ Single Exchange Rate): Near the equilibrium, the amount of assets a and b that agent n would trade given the agent independent exchange rate m_{a:b} is given by:

(13)   \begin{align*} \Delta x_{n,a} &\approx \kappa_{n,a:b} \cdot \left\{ \partial_a V_n - m_{a:b} \cdot \partial_b V_n \right\}^2 \\ \Delta x_{n,b} &\approx - m_{a:b} \cdot \Delta x_{n,a} \end{align*}

where \kappa_{n,a:b} is a agent dependent positive constant.

This result follows directly from a first order Taylor expansion of the utility gain to trading around the fixed point of \Delta x_{n,a} = 0:

Proof: Suppose that the trade operator \mathbb{T} changes agent n‘s asset holdings in asset a by depositing \Delta x_{n,a}. Then, an equilibrium would represent a fixed point such that the following 2 properties hold:

(14)   \begin{align*} \frac{\partial \left( \Delta_{\mathbb{T}} V_n \right)}{\partial \left( \Delta x_{n,a}^* \right)} &= \left\{ \partial_a V_n - m_{a:b} \cdot \partial_b V_n  \right\} = 0 \\ \frac{\partial^2 \left( \Delta_{\mathbb{T}} V_n \right)}{\left\{\partial \left( \Delta x_{n,a}^* \right) \right\}^2} &< 0 \end{align*}

The first order condition says that agent n no longer wants to trade and the second order condition says that we are at a local optimum. A Taylor expansion of \Delta_{\mathbb{T}} V_n around its true fixed point yields first and second order terms:

(15)   \begin{align*} \Delta_{\mathbb{T}} V_n &\approx \Delta x_{n,a} \cdot \left\{ \partial_a V_n - m_{a:b} \cdot \partial_b V_n  \right\} \Big\vert_{\Delta x_{n,a} = 0} \\ &\qquad \qquad + \frac{\left(\Delta x_{n,a}\right)^2}{2} \cdot \left\{ \partial_a^2 V_n + m_{a:b}^2 \cdot \partial_b^2 V_n - 2 \cdot m_{a:b} \cdot \partial_a \partial_b V_n \right\}\Big\vert_{\Delta x_{n,a} = 0} + \ldots \end{align*}

Taking the first order condition with respect to asset a and solving for \Delta x_{n,a} and \Delta x_{n,b} in its mirror image yields the equations above.

4.2 Barter

Next consider the case of pairwise trading between agents which I refer to as bartering. In this world, I assume that the bargaining process is exogenously specified and the agents take the split of the gains to trade as given. What’s more, because agents are perfectly myopic in their utility specifications, they have no concern for the matching process which assigns traders to new partners each period.

Definition (Equilibrium): An equilibrium is a \bar{n} \times \bar{a} matrix of allocations \mathbf{X} as well as a set of no more than \bar{n}! pairwise exchange rates m_{n:n',a:b} such that given each exchange rate, we have that:

  1. For each agent n \in \mathcal{N}, the allocation X_n satisfies:

    (16)   \begin{align*} 0 &= \frac{\partial \left( \Delta_{\hat{\mathbb{T}}} V_n \right)}{\partial \left( \Delta x_{n,a}^* \right)} \end{align*}

  2. Each pairwise market clears:

    (17)   \begin{align*} 0 &= \Delta x_{n,a} + \Delta x_{n',a} \end{align*}

In such a world, we have the following perturbation equilibrium reult:

Proposition (Equilibrium w/ Barter): Near the equilibrium, the amount of product a and b that 2 agents n and n' exchange as a result of a bargaining process \hat{\mathbb{T}} is given by:

(18)   \begin{align*} \Delta x_{n,a} &\approx = \hat{\kappa}_{n,a:b} \cdot \left\{ \partial_a \left( V_n - V_{n'} \right) \cdot \partial_b \left( V_n + V_{n'} \right) -  \partial_a \left( V_n + V_{n'} \right) \cdot \partial_b \left( V_n - V_{n'} \right) \right\} \\ \Delta x_{n,b} &\approx - \left( \frac{\partial_a \left( V_n + V_{n'} \right)}{\partial_b \left( V_n + V_{n'} \right)} \right) \cdot \Delta x_{n,a} \end{align*}

This result builds on the equilibrium characterization from above:

Proof: Since the agents are still (somewhat unrealistically) price takers in this world, we can use the equilibrium formulae from the proposition above; however, now we have a additional information which we can use to further restrict the demand functionals. In particular, we know that every pairwise trade nets out to 0:

(19)   \begin{align*} 0 &= \Delta x_{n,a} + \Delta x_{n',a} \end{align*}

As a result, we can solve for the fixed exchange rate by treating agents n and n'‘s equilibrium demand functionals as a system of 2 equations with 1 unknown:

(20)   \begin{align*} m_{n:n',a:b} &= \frac{\kappa_{n,a:b} \cdot \partial_a V_n + \kappa_{n',a:b} \cdot \partial_a V_{n'}}{\kappa_{n,a:b} \cdot \partial_b V_n + \kappa_{n',a:b} \cdot \partial_b V_{n'}} \end{align*}

Substituting this formula back into the equation for each agent’s demand function given an exogenous exchange rate yields the equilibrium result.

Page 1 of 41234»