Pearson-Wong Diffusions

1. Introduction

I introduce the concept of Pearson-Wong diffusions and then show how this mathematical object can be put to use in macro-finance.

Roughly speaking, Pearson-Wong diffusions link properties of stochastic processes to properties of cross-sectional distributions in the resulting population. For example, suppose you have in mind a stochastic process that governs the total sales of each firm in the US. If this stochastic process is a Pearson-Wong diffusion you would also know what the steady state cross-sectional distribution of firm sales would be. Conversely, if you observed a particular cross-sectional distribution of firm sales, then if you assumed all firms had a similar sales growth process and that the economy was in a steady state, you could then back out which Pearson-Wong diffusion was governing sales growth in the economy up to an affine transformation.

First, in the Section 2, I define the Pearson (1895) system of distributions. Next, in Section 3, I elaborate on work by Wong (1964) and show that a broad class of diffusion processes with polynomial volatility called Pearson-Wong diffusions lead to steady state distributions in the Pearson system. I show that these distributions are uniquely defined by their polynomial coefficients. In Section 4, I show how a broad set of common continuous time processes in macro-finance such as Ornstein-Uhlenbeck processes and Feller square root processes sit in this class of Pearson-Wong diffusions. Finally, I conclude in Section 5 by returning to this sales volume example above taken from Gabaix (2011) and showing that variation in the cross-sectional distribution of firm sales volume implies variation in the functional form of the stochastic process governing each firm’s aggregate sales.

2. The Pearson System of Distributions

In this section I motivate and define the Pearson system of distributions. Karl Pearson developed the Pearson system of distributions as a taxonomy for understanding the skewed distributions he was finding in the biological data he was studying. For instance, Pearson had access to data on dimensions of crabs caught off the coast of Naples as illustrated in the figure below1. When studying the ratio of the length of the crabs to their breadth, he found a distribution that was non-normal, and almost seemed to be a mixture of 2 normal distributions.

The data give the ratio of "forehead" breadth to body length for 1000 crabs sampled at Naples. Source: R mixdist package (

In order to manipulate these data analytically, Pearson then searched out a simple functional form that would capture the main features of these skewed distributions with only a handful of parameters. In particular, he was after a formulation that fit continuous, single peaked distributions over various ranges with varying levels of skewness and kurtosis. Through guess and check, he settled on the definition below2:

Definition (Pearson System): A continuous, univariate, probability distribution p(x) over x \in (\underline{x}, \overline{x}) is a member of the Pearson System if it satisfies the differential equation below with constants r, a, b and c:

(1)   \begin{align*} \frac{d}{dx} p(x) &= \frac{x-r}{a \cdot x^2 + b \cdot x + c} \cdot p(x) \end{align*}

What are the features of this formulation? First, if r is not a root of the polynomial a \cdot x^2 + b \cdot x + c then p(x) is finite. Next, we see that r characterizes the signle peak of the distribution as dp(x)/dx = 0 when x = r. What’s more, we know that p(x) has to be single peaked as p(x) \geq 0 and \int_{\underline{x}}^{\overline{x}} p(x) dx = 1, so p(x) and d p(x)/dx must tend towards 0 as x goes to \pm \infty.

Heuristically speaking, we can think about r as parameterizing the peak of the distribution and the quadratic polynomial as characterizing the rate of descent from this peak in either direction as a function of x. Importantly, the solution to this differential equation will depend on the character of the roots of the quadratic polynomial:

(2)   \begin{align*} 0 &= a \cdot x^2 + b \cdot x + c \end{align*}

In his original 1895 paper, Pearson spent most of his time actually classifying different types of distributions based on the nature of their respective polynomials. Below is a short list of distributions that fall into the Pearson class:

    \begin{align*} \begin{array}{|l|} \text{Distribution} \\ \hline \hline \textit{Normal} \\ \textit{Gamma} \\ \textit{Inverse-Gamma} \\ \textit{Student t} \\ \textit{Beta} \end{array} \end{align*}

Below, I walk through an example showing how the normal distribution fits into the Pearson system:

Example (Gaussian Distribution): When a = b = 0 we get the Gaussian PDF. First, note that given these assumptions, the differential equation above can be written as:

(3)   \begin{align*} \frac{d}{dx} \ln p(x) &= -\frac{x-r}{c} \end{align*}

Thus, by integrating up we see that the solution has the form:

(4)   \begin{align*} p(x) &= k \cdot e^{-\frac{(x-r)^2}{2 \cdot c}} \end{align*}

If we choose k such that the probability mass over the real line is 1, we get k = \sqrt{2 \cdot \pi \cdot c}.

3. Main Results

In this section I define the class of Pearson-Wong diffusions and outline the mapping between the coefficients of the stochastic process and the parameters of the cross-sectional distribution. In the analysis below, I consider time homogeneous diffusion processes; i.e., the coefficients of the stochastic process can only depend on time t through the value of X_t:

Definition (Time Homogeneous Diffusion): Let m(x) and \sigma(x) be real valued functions that are Lipschitz on the interval (\underline{x}, \overline{x}) with \sigma(x) > 0. Then a diffusion X_t is a time-homogeneous diffusion if there exists a unique solution to the equation:

(5)   \begin{align*} X_t &= x_0 + \int_0^t m(X_s) \cdot ds + \int_0^t \sigma(X_s) \cdot dB_s \end{align*}

Next, I define the class of Pearson-Wong diffusions:

Definition (Pearson-Wong Diffusion): A Pearson-Wong polynomial diffusion is stationary, time homogeneous solution to a stochastic differential equation of the form below, where \theta > 0, B_t is a Brownian motion and the triple of coefficients (\alpha, \beta, \gamma) are such that the square root is well defined when X_t is in the state space (\underline{x}, \overline{x}):

(6)   \begin{align*} dX_t &= \theta \cdot (\mu - X_t) \cdot dt + \sqrt{2 \cdot \theta \cdot (\alpha \cdot X_t^2 + \beta \cdot X_t + \gamma)} \cdot dB_t \end{align*}

What sorts of processes fit inside this class of diffusions? For one example, consider an Ornstein-Uhlenbeck process which would arise if we set \alpha = \beta = 0, \gamma = 1 and \theta \in (0,1). In this setting, \sigma^2 = 2 \cdot \theta. In the next section, I show how more exotic process also fit into this box of Pearson-Wong diffusions.

Now, given this definition, I need to derive a mapping between the values of the polynomial coefficients \begin{bmatrix} \alpha & \beta & \gamma \end{bmatrix} and the form of the resulting cross-sectional distribution p(x). I do this in 3 steps. First, I characterize the scale function U(x) and speed density P(x) of the diffusion X_t. Next, I link the infinitesimal generator of the diffusion process X_t to the scale function and speed density of X. Finally, I show that given the mapping between the infinitesimal generator and stochastic processes in the class of Pearson diffusions, if X_t is an ergodic process then this mapping is unique.

Below, I define the scale function for a stochastic process X_t which captures how much the probability of reaching different points y and w in the domain of X_t varies with the starting point x:

Definition (Scale Function): Let X_t be a 1 dimensional diffusion on the open interval (\underline{x}, \overline{x}). A scale function for X_t is an increasing function U(x): (\underline{x}, \overline{x}) \mapsto \mathbb{R} such that for all w < x < y with w,y \in (\underline{x}, \overline{x}), we have that:

(7)   \begin{align*} \mathtt{Pr}[\tau(y) < \tau(w)] &= \frac{U(x) - U(w)}{U(y) - U(w)} \end{align*}

where \tau(w) = \inf\left\{ t > 0 \mid X_t = w \right\}.

For instance, if U(x) = x is a scale function for X_t, then we say that X_t is in its natural scale. By definition, U(x) is a local martingale and satisfies the equation:

(8)   \begin{align*} 0 &= m(x) \cdot U'(x) + \frac{\sigma(x)^2}{2} \cdot U''(x) \end{align*}

This is a linear first order differential equation of u(x) with variable coefficients leading to a standard solution:

(9)   \begin{align*} \ln u(x) &= \int_{x_0}^{\overline{x}} \frac{s - \mu}{\alpha \cdot s^2 + \beta \cdot s + \gamma} \cdot ds \end{align*}

where x_0 is a fixed point such that \alpha \cdot x_0^2 + \beta \cdot x_0 + \gamma > 0. Next, I define a speed measure P(x) which captures the probability that x will exceed a certain value in finite time; i.e., will ever reach a value:

Definition (Speed Measure): The speed measure P(x) is the measure such that the infinitesimal generator of X_t can be written as:

(10)   \begin{align*} \mathbb{A} f(x) &= \frac{d^2}{dP \cdot dU} f(x) \end{align*}

where we have that:

(11)   \begin{align*} \frac{d}{dU} f(x) &= \lim_{h \to 0} \frac{f(x+h) - f(x)}{U(x+h) - U(x)} \\ \frac{d}{dP} g(x) &= \lim_{h \to 0} \frac{g(x+h) - g(x)}{P(x,x+h)} \end{align*}

Thus, it is in fact the \mathtt{PDF} of the cross-sectional distribution as we consider this probability as t \to \infty. This measure has a particularly nice functional form which allows for easy analytical computations in the case of Pearson-Wong diffusions. The lemma below characterizes this formulation:

Lemma (Speed Measure): The speed density of a Pearson-Wong diffusion is given by the fomula below where x_0 is a fixed point such that \alpha \cdot x_0^2 + \beta \cdot x_0 + \gamma > 0:

(12)   \begin{align*} p(x) &= \frac{1}{u(x) \cdot (\alpha \cdot x^2 + \beta \cdot x + \gamma)} \end{align*}

with P'(x) = p(x) and U'(x) = u(x).

The proof of this lemma stems from the definition of the infinitesimal generator:

Proof (Speed Measure): On one hand, from the definition of the speed measure, we have that:

(13)   \begin{align*} \mathbb{A} f(x) &= \frac{d^2}{dU(x) \cdot dP(x)} f(x) \\ &= \frac{1}{U'(x)} \cdot \frac{d}{dx} \left( \frac{1}{P'(x)} \cdot \frac{d}{dx} f(x) \right) \\ &= \frac{1}{u(x) \cdot p(x)} \cdot f''(x) - k(x) \cdot f'(x) \end{align*}

where k(x) is some well behaved function of x. On the other hand, from the definition of an infinitesimal generator, we have that:

(14)   \begin{align*} \mathbb{A} f(x) &= \frac{1}{2} \cdot \sigma(x)^2 \cdot f''(x) + m(x) \cdot f'(x) \end{align*}

Thus, we have that \left[ u(x) \cdot p(x) \right]^{-1} \propto \sigma(x)^2/2.

Thus, we have now marched through the framework for first 2 steps of the construction of the link between a stochastic process in the class of Pearson-Wong diffusions and their corresponding cross-sectional distributions. All I need to do now is flesh out the requirements for uniqueness. In order to attain this property, I need an additional assumption on the class of Pearson-Wong diffusions: ergodicity. Below, I give a formal definition of this additional assumption:

Definition (Ergodic Pearson-Wong Diffusion): If (\underline{x}, \overline{x}) is an interval such that a \cdot x^2 + b \cdot x + c > 0 for all x \in (\underline{x}, \overline{x}) for a Pearson-Wong diffusion X_t, then X_t is ergodic if:

(15)   \begin{align*} \infty &= \int_{x_0}^{\overline{x}} u(x) \cdot dx = \int_{\underline{x}}^{x_0} u(x) \cdot dx \\ \infty &> \int_{\underline{x}}^{\overline{x}} V(x) \cdot dx \end{align*}

If \int_{x_0}^{\overline{x}} u(x) \cdot dx, then the boundary \underline{x} can be reached in finite time with positive probability.

Proposition (Pearson-Wong Mapping): For all ergodic diffusions in the Pearson-Wong class parameterized by the coefficient vector \begin{bmatrix} \alpha & \beta & \gamma \end{bmatrix}, there exists a unique invariant distribution in the Pearson system.

Ergodicity ensures that there are no eddies in the state space where multiple diffusions can get trapped yielding observationally equivalent cross-sectional distributions p(x) for different diffusion processes.

Proof (Pearson-Wong Mapping): From the lemma above, we know that the scale measure as the density:

(16)   \begin{align*} u(x) &= e^{\int_{x_0}^x \frac{s - \mu}{\alpha \cdot s^2 + \beta \cdot s + \gamma} \cdot ds} \end{align*}

where x_0 \in (\underline{x}, \overline{x}) is a point such that \alpha \cdot x_0^2 + \beta \cdot x_0 + \gamma > 0. What’s more, we know that:

(17)   \begin{align*} p(x) &\propto \frac{1}{u(x) \cdot (\alpha \cdot x^2 + \beta \cdot x + \gamma)} \end{align*}

Differentiating p(x) yields:

(18)   \begin{align*} \frac{d}{dx} p(x) &= - \frac{(2 \cdot \alpha + 1) \cdot x - \mu + \beta}{\alpha \cdot x^2 + \beta \cdot x + \gamma} \cdot p(x) \end{align*}

4. Examples

In this section I work through 2 examples which illustrate how to fit the Vasicek process and a reflecting process that generates a cross-sectional distribution that satisfies Zipf’s law.

In a Vasicek model returns follow an Ornstein-Uhlenbeck process:

(19)   \begin{align*} dX_t &= \theta \cdot (\mu - X_t) \cdot dt + \sigma \cdot dB_t \end{align*}

with \theta, \mu, \sigma > 0. Thus, in the functional notation of the Pearson-Wong diffusion, we have that 2 \cdot \theta = \sigma^2, a,b=0 and c=1. Using the formulation above, we see that:

(20)   \begin{align*} \frac{d}{dx} \ln p(x) &= - \frac{x - \mu}{\sigma} \end{align*}

This is the exact same formulation as the Ornstein-Uhlenbeck example from the first section. Thus, we have that:

(21)   \begin{align*} p(x) &= \frac{1}{\sqrt{2 \cdot \pi \cdot \sigma}} \cdot e^{-\frac{(x - \mu)^2}{2 \cdot \sigma^2}} \end{align*}

by solving for the constant via the boundary condition that \int_{-\infty}^\infty p(x) \cdot dx = 1.

Next, consider a more complicated reflecting process that is defined only on the positive half-line in the form of a power law distribution with reflecting boundary at \underline{x}>0. Specifically, suppose that you have a cross-sectional probability density p(x) defined as:

(22)   \begin{align*} p(x) &\propto \frac{1}{x^2} \end{align*}

which is defined on (\underline{x},\infty). We see that the cummulative probability density is proportional to 1/x so that Zipf’s law3 holds. However, note that there is no x term in the numerator of the differential equation defining p(x):

(23)   \begin{align*} \frac{d}{dx} p(x) &= - \frac{2}{x} \cdot p(x) \end{align*}

Thus, the power law cross-sectional distribution acts as a limiting case of the class of Pearson-Wong diffusions with \beta = \gamma = 0 and \alpha \gg \mu:

(24)   \begin{align*} \frac{d}{dx} \ln p(x) &= - \frac{2}{x} \\ &= - \frac{\overbrace{(2 \cdot \alpha + 1)}^{\approx 2 \cdot \alpha} \cdot x}{\alpha \cdot x^2} - \underbrace{\left(\frac{\mu}{\alpha \cdot x^2}\right)}_{\approx 0} \end{align*}

This solution works given the reflecting boundary \bar{x} > 0 as, for \alpha large enough the second term on the right hand side will be roughly 0 and the +1 in the first term will be negligible.

5. Conclusions

In the text above, I outline the topic of Pearson-Wong diffusions and also relate these results in continuous time mathematics to topics in macro-finance.

I conclude by looking at a final application in a recent Econometrica article, Gabaix (2011), on the granular origins of aggregate macroeconomic fluctuations. The core idea of this paper is that, when the cross-sectional distribution of firm production, S_{t,n}, is Gaussian or some other thin-tailed distribution, shocks to the largest firms won’t matter as the number of firms N \to \infty. However, if firm production is distributed according to a fat tailed distribution, then shocks to the production of the largest firms will matter.4

Proposition 2 of Gabaix (2011) gives the central result. Namely, that if firm size is distributed according to a power law,

(25)   \begin{align*} \mathtt{Pr}[S > \bar{s}] &= a \cdot x^{-\zeta}, \end{align*}

then as N \to \infty, if \zeta \geq 2 shocks to large firms won’t matter, while if \zeta \in [1,2) shocks to large firms will matter.

Interestingly, the Pearson-Wong diffusion mathematics above gives us a new results for the implications of switching from the limiting case of \zeta = 1 to the case of \zeta \in (0,1). With \zeta \in (0,1), there will now be a new parameter \beta to estimate. Thus, variation in how dispersed firms are in terms of their output reveals meaningful information about the structure of the stochastic process to which each firm’s output adheres.

  1. Source: R mixdist package.
  2. Background info comes from Ord (1985).
  3. Gabaix (1999) or Tao (2009).
  4. In practice, shocks to the largest couple of firms do seem to have an impact on even large economies. For example, in December 2004, Microsoft issued a \$24B one-time dividend which boosted the growth in average personal income that year from 0.6\% to 3.7\% in the United States.