Notes: Gabaix (2012)

1. Introduction

In this post, I review the sparsity based model of bounded rationality introduced in Gabaix (2011) and then extended in Gabaix (2012).

In the baseline framework presented in Gabaix (2011), a boundedly rational agent faces the problem of which action to choose in order to maximize his utility where his optimal action may depend on many different variables. The agent then chooses an action by following a $2$ step algorithm: first, he chooses a sparse representation of the world whereby he completely ignores many of the possible variables that might affect his optimal action choice; second, he uses this endogenously chosen sparse representation of the world to choose a boundedly rational action. Gabaix (2012) then shows how to use this sparsity based framework to solve a dynamic programming problem.

2. Illustrative Example

In this section, I work through a simple example showing how to build bounded rationality into a decision maker’s problem in a tractable way. I begin by defining the problem. Consider the manager of a car factory, call him Bill, who gets to choose how many cars the factory should produce. Let $V(a;\mu,x)$ denote Bill’s value function in units of $\mathtt{utils}$ given an action $a$ over how many cars to produce:

(1) $\begin{align*} \max_a V(a;\mu,x) &= \min_a \left\{ \ \frac{\gamma}{2} \cdot \left( \ a - \sum_{n=1}^N \mu_n \cdot x_n \right)^2 \ \right\} \end{align*}$

Here, the vector $x$ denotes a collection of factors that should enter into Bill’s decision process. For instance, he might worry about last month’s demand, $x_1$ , the current US GDP, $x_2$ , the recent increase in the cost of break pads, $x_{30}$ , and the completion of the new St. Croix River bridge in Minneapolis, MN, $x_{500}$ . Likewise, the vector $\mu$ denotes how much weight the Bill should place on each of the $N$ different elements that might enter into his decision process. Each of the elements $\mu_n$ are in units that convert decision factors into units of $\mathtt{cars}$ . So, for example, $\mu_{500}$ would have units of $\mathtt{cars}/\mathtt{bridge \ completions}$ while $\mu_2$ would have units of $\mathtt{cars}/\$$ . $\gamma$ is a constant with units of $\mathtt{utils}/\mathtt{cars}^2$ which balances the equation.

Next, consider Bill’s optimal and possible sub-optimal action choices. If Bill is completely unconstrained and can pick any $a$ whatsoever, he should choose:

(2) $\begin{align*} a(\mu;x) &= \sum_{n=1}^N \mu_n \cdot x_n \end{align*}$

where $a(\cdot;x)$ has units of $\mathtt{cars}$ . However, suppose that there were some constraints on Bill’s problem and he could not fully adjust his choice of how many cars to produce in response to every little piece of information in the vector $x$ . Let $a(m;x)$ denote his choice of how many cars to produce where $m_n = 0$ for most $n \in N$ :

(3) $\begin{align*} a(m;x) &= \sum_{n=1}^N m_n \cdot x_n \end{align*}$

For instance, if $m_2 \neq 0$ but $m_{30} = 0$ then Bill will adjust the number of cars he produces in response to a change in the US GDP but not in response to a change in the price of break pads. Thus, Bill’s choice of how many cars to produce $a$ can be rewritten as a choice of how much he should weight each potential decision factor $x_n$ .

In order to complete the construction of Bill’s boundedly rational decision problem, I now have to define a loss function for Bill which trades off the benefits of choosing a weighting vector $m$ which is closer to the optimal choice $\mu$ against the cognitive costs thinking about all of the nitty-gritty details of the $N$ dimensional vector of decision factors $x$ . As a benchmark, suppose that there were not cognitive costs. In such a world, Bill would choose $a = a(\mu;x)$ as he would suffer a quadratic loss from deviating from this optimal strategy defined by the function $L(m,\mu)$ below, but no compensating “cognitive” gain from not having to think about how the construction of a bridge in Minneapolis should affect his production decision:

(4) $\begin{align*} L(m,\mu) &= \mathtt{E} \left[ \ V(a(\mu;x); \mu,x) - V(a(m;x); \mu,x) \ \right] \\ &= - \frac{\gamma}{2} \cdot \mathtt{E} \left[ \ \left( \sum_{n=1}^N \left( m_n - \mu_n \right) \cdot x_n \right)^2 \ \right] \\ &= \frac{\gamma}{2} \cdot \sum_{n=1}^N \left( m_n - \mu_n \right)^2 \cdot 1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2 \end{align*}$

where the term $1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2$ is a place holder which helps balance the units in the equation. One method for incorporating cognitive costs into Bill’s decision problem would be to charge him $\kappa$ units of utility per $|m_n|^\alpha$ unit of emphasis on each decision factor. So, for example, if $\alpha = 2$ and Bill increased his production by $5 \mathtt{cars}/\$1\mathtt{B}$ in GDP growth, then Bill would pay a cognitive cost of $\kappa \cdot 5^2$ where $\kappa$ has units of $\mathtt{utils}$ . Below, I formulate this boundedly rational loss function as a function of $\alpha$ which I denote $L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;\alpha)$ :

(5) $\begin{align*} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;\alpha) &= \min_{m \in \mathcal{R}^N} \left\{ \ \frac{\gamma }{2} \cdot \sum_{n=N} (m_n - \mu_n)^2 \cdot 1_{\scriptscriptstyle \mathtt{dim}[x_n]}^2 + \kappa \cdot \sum_{n=1}^N \left( |m_n| \cdot 1_{\scriptscriptstyle \frac{\mathtt{dim}[x_n]}{\mathtt{cars}}} \right)^\alpha \ \right\} \end{align*}$

where the term $1_{\scriptscriptstyle \mathtt{dim}[x_n] / \mathtt{cars}}$ is constant with units $\mathtt{dim}[x_n] / \mathtt{cars}$ in order to balance the equation.

The key idea is that Bill’s decision problem is both convex and sparse only when $\alpha = 1$ . Below, I give $3$ examples which illustrate this fact:

Example (Quadratic Complexity Costs): First, consider the quadratic case when $\alpha = 2$ . Then, taking the partial derivative of $L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;2)$ with respect to an arbitrarily chosen dimension $n \in N$ and setting $\gamma = 1_{\scriptscriptstyle \mathtt{utils}/\mathtt{cars}^2}$ for simplicity we the optimality condition:

(6) $\begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;2) \\ &= - (m_n - \mu_n) - 2 \cdot \kappa \cdot m_n \end{align*}$

This yields an internal optimum for Bill’s choice of $m_n$ :

(7) $\begin{align*} m_n &= \frac{\mu_n}{1 + 2 \cdot \kappa} \end{align*}$

While this is an easy solution to solve for, setting $\alpha = 2$ doesn’t yield any sparsity as $m_n \neq 0$ whenever $\mu_n \neq 0$ .

Graphically, in the above example, the boundedly rational agent will choose a weight $m_n$ at the point on the $x$ -axis in the figure below where the slope of the solid pink $|m_n|^2$ curve is exactly equal to the slope of the dashed black line.

The solid colored lines represent the cognitive costs faced by a boundedly rational agent with kappa = 1 when alpha = 2, 3, 4, 5 respectively. The dashed black line represents the gain to the boundedly rational agent to increasing the weight m_n on a particular decision factor when gamma = 1.

Example (Fixed Complexity Costs): On the other hand, let’s not think about a case where $\alpha = 0$ with the convention that $|m_n|^\alpha = 1_{m_n \neq 0}$ , then I would want to again set:

(8) $\begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;0) \end{align*}$

However, this problem is no longer convex as the costs to increasing $m_n$ a little bit away from $0$ will always outweigh the incremental benefits. Thus, I get a solution:

(9) $\begin{align*} m_n &= \begin{cases} \mu_n &\text{ if } |\mu_n| \geq \sqrt{2 \cdot \kappa} \\ 0 &\text{ else } \end{cases} \end{align*}$

While well posed, this non-convex problem is computationally hard to solve in an extremely severe way as the solution strategy expands combinatorially rather than linearly in the number of variables $N$ . e.g., see example $6.4$ in Boyd (2004) describing the regression selector problem.

Graphically, we can see the intuition for the problem posed by non-convexity for $\alpha \in (0,1)$ in the figure below. We can see for the blue solid line representing the $\alpha = 1/5$ case, an incremental increase in $m_n: 0 \to \epsilon$ where $\epsilon$ is just marginally greater than $0$ will have a cognitive cost much greater than the benefit; however, for $m_n: 0 \to \mathcal{E}$ where $\mathcal{E} \gg 0$ the increase in the weighting factor $m_n$ will outweigh the benefit. The $\alpha = 0$ can be seen as an even more extreme limiting case as the blue line becomes increasingly kinked and eventually becomes a flat line at $\sqrt{2}$ .

The solid colored lines represent the cognitive costs faced by a boundedly rational agent with kappa = 1 when alpha = 1/2, 1/3, 1/4, 1/5 respectively. The dashed black line represents the gain to the boundedly rational agent to increasing the weight m_n on a particular decision factor when gamma = 1.

Looking at the previous $2$ examples, we can follow the goldie locks logic and see that there is a particular parameterization of $\alpha$ which yields both sparsity and convexity… namely, $\alpha = 1$ .

Example (Linear Complexity Costs): Finally, consider the case where $\alpha = 1$ . Here, we find that:

(10) $\begin{align*} 0 &= \partial_{m_n} L^{\scriptscriptstyle \mathtt{BR}}(m,\mu;1) \\ &= - (m_n - \mu_n) - \kappa \cdot \mathtt{sgn} [m_n] \end{align*}$

where $\mathtt{sgn}[\cdot]$ denotes the sign operator which returns the sign of a real constant. Thus, we have $3$ different options for the optimal choice of $m_n$ :

(11) $\begin{align*} m_n &= \begin{cases} \mu_n + \kappa &\text{ if } \mu_n \leq - \kappa \\ 0 &\text{ if } |\mu_n| < \kappa \\ \mu_n - \kappa &\text{ if } \mu_n \geq \kappa \end{cases} \end{align*}$

Thus, the setting where $\alpha = 1$ is both analytically tractable and sparse as any variables with $|\mu_n| < \kappa$ will be ignored by the decision maker.

3. Static Problem Formulation

In the above example, I showed how to embed sparsity into Bill’s decision problem using a linear cost function. However, Bill’s problem was extremely simple in the sense that he had a linear-quadratic value function. Can the intuition developed in this simple example be extended to more complicated example with more elaborate utility functions? In the simple example, I found that there was a clean knife-edge result regarding $\alpha=1$ being the only power coefficient which delivered both sparsity and a tractable solution; however, this result depended on Bill’s utility gain to increasing his weighting $m_n$ on decision factor $n$ being linear. e.g., look at the black dashed line in the $2$ figures above. Will this same intuition regarding $\alpha = 1$ hold for more complicated utility functions?

Gabaix (2011) shows that the intuition indeed holds for a wide variety of utility specification after using an appropriate quadratic approximation of the problem around a reference representation of the world $\bar{m}$ and action $\bar{a}$ . For instance, in Bill’s problem above, this would be like allowing his value function $V(a;m,x)$ to be non-quadratic and then approximating his problem as quadratic around a reference $\bar{m} = \begin{bmatrix} \mu_1 & \mu_2 & 0 & \cdots & 0 \end{bmatrix}$ implying that his default decision is to ignore all variables except for last period’s demand and the current value of GDP and number of cars to produce $\bar{a} = 100$ .

In order to construct this approximation, I need to define $3$ objects. First, for an arbitrary value function $V(a;m,x)$ define the partial derivatives $V_{a,m}$ and $V_{a,a}$ as:

(12) $\begin{align*} V_{a,m} &= \left. \frac{\partial^2 V}{\partial a \cdot \partial m}(a;m,x) \right|_{\bar{a},\bar{m}} \\ V_{a,a} &= \left. \frac{\partial^2 V}{(\partial a)^2}(a;m,x) \right|_{\bar{a},\bar{m}} \end{align*}$

where $V_{a,a}$ is negative definite implying that $V(a;m,x)$ is strictly concave in the neighborhood of $\bar{x}$ . I then use these $2$ matrices to define a weighting matrix $\Lambda$ which captures how much information is lost via the quadratic approximation:

(13) $\begin{align*} \Lambda &= - \mathtt{E} \left[ V_{a,m} V_{a,a}^{-1} V_{a,m} \right] \end{align*}$

This matrix $\Lambda$ corresponds to the weighting matrix $S$ used in GMM to differentially interpret the error terms from each of the equations a la Cochrane (2005). For instance, when digesting a vector of errors from a set of pricing equations in GMM, a large $s_{i,j}$ term in the weighting matrix $S$ means that a set of coefficients which produce large errors in equations $i$ and $j$ at the same time will be penalized more heavily. In the quadratic approximation below, $\Lambda$ will be used to differentially interpret the loadings $m_n$ on different decision factors.

Step 1 (Choose Representation): The agent chooses his optimal sparse representation of the world $m$ as the solution to the optimization problem:

(14) $\begin{align*} \min_{m} \left\{ \ \frac{1}{2} \cdot (m - \mu)^{\top} \Lambda (m - \mu) + \kappa(m) \ \right\} \end{align*}$

where $\kappa(m)$ captures the cognitive cost of a model and is defined as:

(15) $\begin{align*} \kappa(m) &= \kappa^m \cdot \sum_{n=1}^N |m_n - \bar{m}_n| \cdot \left\Vert V_{m_n,a} \cdot \eta_m \right\Vert \end{align*}$

In the formulation above, $\left\Vert V_{m_n,a} \cdot \eta_m \right\Vert$ plays the same role as $1_{\scriptscriptstyle \frac{\mathtt{dim}[x_n]}{\mathtt{cars}}}$ in the motivating example and simply controls the units and scale of the agent’s choice of representations. In general, $\eta_m$ will not be problem specific in any material sense.

Step 2 (Choose Actions): The agent maximizes over his choice of $J$ actions $a$ :

(16) $\begin{align*} \max_a \left\{ \ V(a;m,x) - \kappa(a) \ \right\} \end{align*}$

where the cognitive cost of deviating from the default action is $\kappa(a)$ :

(17) $\begin{align*} \kappa(a) &= \kappa^a \cdot \sum_{j=1}^J |a_j - \bar{a}_j| \cdot \left\Vert V_{a_j,m} \cdot \eta_a \right\Vert \end{align*}$

In the motivating example, Bill did not face Step $2$ of the algorithm at all as his choice of car production levels was unconstrained after choosing a representation of the world $m$ . This second step allows for physical difficulty of adjusting the action given the agent’s understanding of the decision problem. e.g., consider a decision maker who makes a portfolio decision regarding how to allocate his firm’s wealth. After choosing a representation of the world in Step $1$ , he might look at the facts $x$ through the lens of this representation $m$ and decide that he needs to lengthen the duration of his portfolio; however, since there are many different ways to do this in practice, he may face cognitive costs in executing such a change. Step $2$ captures these sort of costs.

In summary, a boundedly rational agent with a preference for sparsity first employs a quadratic approximation of his value function and then uses a linear cost function to price the cognitive cost of the complexity of his model of the world. What’s more, this same linear cognitive cost function can be used in a secondary step to incorporate settings where there are cognitive costs to executing a given action.

4. Application to Dynamic Programming

I conclude by showing how to use these tools to solve for a representative agent’s optimal consumption choice in an infinite horizon economy by treating the terms in a Taylor expansion of the optimal consumption rule around the steady state as increasingly complicated decision factors. Consider a standard, discrete time, consumption based model where a representative agent has power utility with risk aversion parameter $\gamma$ and time preference parameter $\beta$ described by the preferences below:

(18) $\begin{align*} \mathtt{E}\left[ \ \sum_{t=0}^\infty \beta^t \cdot \frac{c_t^{1 - \gamma}}{1-\gamma} \ \right] \end{align*}$

Assume for simplicity that there is a single risky asset with return $\bar{r}$ where $1 + \bar{r} = \beta$ for simplicity. Then, I can write this representative agent’s problem recursively as follows:

(19) $\begin{align*} \overline{V}(w_t) &= U(c_t) + \beta \cdot \mathtt{E}_t \left[ \ V\left( (1 + \bar{r}) \cdot (w_t -c_t) + \bar{y} \right) \ \right] \\ w_{t+1} &= (1 + \bar{r}) \cdot (w_t - c_t) + \bar{y} \end{align*}$

where $\overline{V}(w_t)$ is the agent’s value function given wealth level $w_t$ and $\bar{y}$ is the constant endowment rate. In this world, it is easy to derive that the optimal consumption choice is given by:

(20) $\begin{align*} \ln \bar{c}_t &= \ln \left[ \bar{r} \cdot w_t + \bar{y} \right] - \bar{r} \end{align*}$

In the remainder of this example, I assume that even the boundedly rational representative agent can solve this simple problem in closed form. However, when I tweak the problem and allow the true values to be $r_t = \bar{r} + \hat{r}_t$ and $y_t = \bar{y} + \hat{y}_t$ where the idiosyncratic terms $\hat{r}_t$ and $\hat{y}_t$ evolve according to $\mathtt{AR}(1)$ processes below with mean $0$ shocks $\epsilon_{y,t+1}$ and $\epsilon_{r,t+1}$ :

(21) $\begin{align*} \hat{y}_{t+1} &= \rho_y \cdot \hat{y}_t + \epsilon_{y,t+1} \\ \hat{r}_{t+1} &= \rho_r \cdot \hat{y}_t + \epsilon_{r,t+1} \end{align*}$

the representative agent will want to seek a sparse solution due to cognitive costs. Introducing these idiosyncratic terms yields a more complicated value function $V$ with $2$ additional state variables when writing the problem recursively:

(22) $\begin{align*} V(w_t;r_t,y_t) &= u(c_t) + \beta \cdot \mathtt{E}_t \left[ \ V\left( (1 + \bar{r} + \hat{r}_t) \cdot (w -c) + \bar{y} + \hat{y}_t;r_{t+1},y_{t+1} \right) \ \right] \\ w_{t+1} &= (1 + \bar{r} + \hat{r}_t) \cdot (w_t - c_t) + \bar{y} + \hat{y}_t \end{align*}$

where there will in general be no closed form solution for the optimal choice of $\ln c_t^{\scriptscriptstyle \mathtt{Rat}}$ .

However, suppose that we took a Taylor expansion of the optimal $\ln c_t^{\scriptscriptstyle \mathtt{Rat}}$ around the benchmark solution to get:

(23) $\begin{align*} \ln c_t^{\scriptscriptstyle \mathtt{Rat}} &= \ln \bar{c}_t + b_y^{\scriptscriptstyle \mathtt{Rat}} \cdot \hat{y}_t + b_r^{\scriptscriptstyle \mathtt{Rat}} \cdot \hat{r}_t + \mathtt{h.o.t.} \\ b_y^{\scriptscriptstyle \mathtt{Rat}} &= \frac{\bar{r}}{(1 + \bar{r}) \cdot (1 + \bar{r} - \rho_y) \cdot \bar{c}_t} \\ b_r^{\scriptscriptstyle \mathtt{Rat}} &= \frac{\frac{\bar{r} \cdot (w_t - c_t)}{c_t} - \frac{1}{\gamma}}{1 + \bar{r} - \rho_r} \end{align*}$

I treat these $2$ coefficients asthe boundedly rational representative agent’s $\bar{m}$ . Then, I can define his optimal log consumption choice as:

(24) $\begin{align*} \ln c_t^{\scriptscriptstyle \mathtt{BR}} &= \ln \bar{c}_t + b_y^{\scriptscriptstyle \mathtt{BR}} \cdot \hat{y}_t + b_r^{\scriptscriptstyle \mathtt{BR}} \cdot \hat{r}_t \end{align*}$

where $b_r^{\scriptscriptstyle \mathtt{BR}}$ and $b_y^{\scriptscriptstyle \mathtt{BR}}$ are given by the rule:

(25) $\begin{align*} b_i^{\scriptscriptstyle \mathtt{BR}} &= \begin{cases} b_i^{\scriptscriptstyle \mathtt{Rat}} &\text{ if } | b_i^{\scriptscriptstyle \mathtt{Rat}} | \geq \mathtt{constant} \\ 0 &\text{ else } \end{cases} \end{align*}$

A reasonable choice of this constant might be to choose $\kappa \cdot \sigma_{\ln c} / \sigma_i$ for $i = r,y$ as it would have the interpretation that the agent would set $b_i^{\scriptscriptstyle \mathtt{BR}} = b_i^{\scriptscriptstyle \mathtt{Rat}}$ whenever taking the interest rate or endowment level into accound would change the standard deviation of log consumption by $\kappa$ standard deviations and otherwise $b_i^{\scriptscriptstyle \mathtt{BR}} = 0$ . Thus, an sparsity seeking boundedly rational representative agent has a particular rule in mind when figuring out which higher order Taylor terms to ignore.