Impulse-Response Functions for VARs

1. Motivating Example

If you regress the current quarter’s inflation rate, x_t, on the previous quarter’s rate using data from FRED over the period from Q3-1987 to Q4-2014, then you get the AR(1) point estimate,

(1)   \begin{align*} x_t = \underset{(0.09)}{0.31} \cdot x_{t-1} + \epsilon_t, \end{align*}

where the number in parentheses denotes the standard error, and the inflation-rate time series, x_t, has been demeaned. In other words, if the inflation rate is \mathrm{StD}(\epsilon_t) \approx 0.47{\scriptstyle \%} points higher in Q1-2015, then on average it will be 0.31 \times 0.47 \approx 0.14{\scriptstyle \%} points higher in Q2-2015, 0.31^2 \times 0.47 \approx 0.04{\scriptstyle \%} points higher in Q3-2015, and so on… The function that describes the cascade of future inflation-rate changes due to an unexpected 1\sigma shock in period t is known as the impulse-response function.

But, many interesting time-series phenomena involve multiple variables. For example, Brunnermeier and Julliard (2008) show that the house-price appreciate rate, y_t, is inversely related to the inflation rate. If you regress the current quarter’s inflation and house-price appreciation rates on the previous quarter’s rates using demeaned data from the Case-Shiller/S&P Index, then you get:

(2)   \begin{align*} \begin{bmatrix} x_t  \\ y_t \end{bmatrix} =  \begin{pmatrix} \phantom{-}\underset{(0.09)}{0.29} & \underset{(0.02)}{0.01} \\ -\underset{(0.43)}{0.40} & \underset{(0.09)}{0.50} \end{pmatrix} \begin{bmatrix} x_{t-1} \\ y_{t-1} \end{bmatrix} + \begin{pmatrix} \epsilon_{x,t}  \\ \epsilon_{y,t} \end{pmatrix}. \end{align*}

These point estimates indicate that, if the inflation rate were \mathrm{StD}(\epsilon_{x,t}) \approx 0.47{\scriptstyle \%} points higher in Q1-2015, then the inflation rate would be 0.29 \times 0.47 \approx 0.14{\scriptstyle \%} points higher in Q2-2015 and the house-price appreciation rate would be -0.40 \times 0.47 \approx -0.19{\scriptstyle \%} points lower in Q2-2015.

Computing the impulse-response function for this vector auto-regression (VAR) is more difficult than computing the same function for the inflation-rate AR(1) because the inflation rate and house-price appreciation rate shocks are correlated:

(3)   \begin{align*} \mathrm{Cor}(\epsilon_{x,t}, \epsilon_{y,t}) &= 0.13 \neq 0. \end{align*}

In other words, when you see a 1{\scriptstyle \%} point shock to inflation, you also tend to see a 0.13{\scriptstyle \%} point shock to the house-price appreciation rate. Thus, computing the future effects of a 1\sigma shock to the inflation rate and a 0{\scriptstyle \%} point shock to the house-price appreciation rate gives you information about a unit shock that doesn’t happen in the real world. In this post, I show how to account for this sort of correlation when computing the impulse-response function for VARs. Here is the relevant code.

2. Impulse-Response Function

Before studying VARs, let’s first define the impulse-response function more carefully in the scalar world. Suppose we have some data generated by an AR(1),

(4)   \begin{align*} x_t &= \gamma \cdot x_{t-1} + \epsilon_t, \end{align*}

where \mathrm{E}[x_t] = 0, \mathrm{E}[\epsilon_t] = 0, and \mathrm{Var}[\epsilon_t] = \sigma^2. For instance, if we’re looking at quarterly inflation data, x_t = \Delta \log \mathit{CPI}_t, then \gamma = 0.31. In this setup, what would happen if there was a sudden 1\sigma shock to x_t in period t? How would we expect the level of x_{t+1} to change? What about the level of x_{t+2}? Or, the level of any arbitrary x_{t+h} for h > 0? How would a \mathrm{StD}(\epsilon_t) \approx 0.47{\scriptstyle \%} point shock to the current inflation rate propagate into future quarters?

Well, it’s easy to compute the time t expectation of x_{t+1}:

(5)   \begin{align*} \mathrm{E}_t[x_{t+1}] &= \mathrm{E}_t \left[ \, \gamma \cdot x_t + \epsilon_{t+1} \, \right] \\ &= \gamma \cdot x_t. \end{align*}

Iterating on this same strategy then gives the time t expectation of x_{t+2}:

(6)   \begin{align*} \begin{split} \mathrm{E}_t[x_{t+2}]  &= \mathrm{E}_t \left[ \, \gamma \cdot x_{t+1} + \epsilon_{t+2} \, \right]  \\ &= \mathrm{E}_t \left[ \, \gamma \cdot \left\{ \, \gamma \cdot x_t + \epsilon_{t+1} \, \right\} + \epsilon_{t+2} \, \right]  \\ &= \gamma^2 \cdot x_t. \end{split} \end{align*}

So, in general, the time t expectation of any future x_{t+h} will be given by the formula,

(7)   \begin{align*} \mathrm{E}_t[x_{t+h}] &= \gamma^h \cdot x_t, \end{align*}

and the impulse-response function for the AR(1) process will be:

(8)   \begin{align*} \mathrm{Imp}(h) &= \gamma^h \cdot \sigma. \end{align*}

If you knew that there was a sudden shock to x_t of size \epsilon_t = +1\sigma, then your expectation of x_{t+h} would change by the amount \mathrm{Imp}(h). The figure below plots the impulse-response function for x_t using the AR(1) point estimate by Equation (1).


There’s another slightly different way you might think about an impulse-response function—namely, as the coefficients to the moving-average representation of the time series. Consider rewriting the data generating process using lag operators,

(9)   \begin{align*} \begin{split} \epsilon_t &= x_t - \gamma \cdot x_{t-1}  \\ &= (1 - \gamma \cdot \mathcal{L}) \cdot x_t, \end{split} \end{align*}

where \mathcal{L} x_t = x_{t-1}, \mathcal{L}^2 x_t = x_{t-2}, and so on… Whenever the slope coefficient is smaller than 1, |\gamma|<1, we know that (1 - \gamma)^{-1} = \sum_{h=0}^{\infty}\gamma^h, and there exists a moving-average representation of x_t:

(10)   \begin{align*} x_t  &= (1 - \gamma \cdot \mathcal{L})^{-1} \cdot \epsilon_t \\ &= \left( \, 1 + \gamma \cdot \mathcal{L} + \gamma^2 \cdot \mathcal{L}^2 + \cdots \, \right) \cdot \epsilon_t \\ &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot \epsilon_{t-\ell}. \end{align*}

That is, rather than writing each x_t as a function of a lagged value, \gamma \cdot x_{t-1}, and a contemporaneous shock, \epsilon_t, we can instead represent each x_t as a weighted average of all the past shocks that’ve been realized, with more recent shocks weighted more heavily.

(11)   \begin{align*} x_t &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot \epsilon_{t-\ell} \end{align*}

If we normalize all of the shocks to have unit variance, then the weights themselves will be given by the impulse-response function:

(12)   \begin{align*} x_t &= \sum_{\ell = 0}^{\infty} \gamma^{\ell} \cdot (\sigma \cdot \sigma^{-1}) \cdot \epsilon_{t-\ell} \\ &= \sum_{\ell = 0}^{\infty} \mathrm{Imp}(\ell) \cdot (\sfrac{\epsilon_{t-\ell}}{\sigma}). \end{align*}

Of course, this is exactly what you’d expect for a covariance-stationary process. The impact of past shocks on the current realized value had better be the same as the impact of current shocks on future values.

3. From ARs to VARs

We’ve just seen how to compute the impulse-response function for an AR(1) process. Let’s now examine how to extend this the setting where there are two time series,

(13)   \begin{align*} x_t &= \gamma_{x,x} \cdot x_{t-1} + \gamma_{x,y} \cdot y_{t-1} + \epsilon_{x,t}, \\ y_t &= \gamma_{y,x} \cdot x_{t-1} + \gamma_{y,y} \cdot y_{t-1} + \epsilon_{y,t}, \end{align*}

instead of just 1. This pair of equations can be written in matrix form as follows,

(14)   \begin{align*} \underset{\mathbf{z}_t}{ \begin{bmatrix} x_t \\ y_t \end{bmatrix} } &= \underset{\mathbf{\Gamma}}{ \begin{pmatrix} \gamma_{x,x} & \gamma_{x,y} \\ \gamma_{y,x} & \gamma_{y,y} \end{pmatrix} } \underset{\mathbf{z}_{t-1}}{ \begin{bmatrix} x_{t-1} \\ y_{t-1} \end{bmatrix} } + \underset{\boldsymbol \epsilon_t}{ \begin{bmatrix} \epsilon_{x,t} \\ \epsilon_{y,t} \end{bmatrix} }, \end{align*}

where \mathrm{E}[ {\boldsymbol \epsilon}_t ] = \mathbf{0} and \mathrm{E}[ {\boldsymbol \epsilon}_t {\boldsymbol \epsilon}_t^{\top} ] = \mathbf{\Sigma}. For example, if you think about x_t as the quarterly inflation rate and y_t as the quarterly house-price appreciation rate, then the coefficient matrix \mathbf{\Gamma} is given in Equation (2).

Nothing about the construction of the moving-average representation of x_t demanded that x_t be a scalar, so we can use the exact same tricks to write the (2 \times 2)-dimensional vector \mathbf{z}_t as a moving average:

(15)   \begin{align*} \mathbf{z}_t = \sum_{\ell = 0}^{\infty} \mathbf{\Gamma}^{\ell} {\boldsymbol \epsilon}_{t-\ell}. \end{align*}

But, it’s much less clear in this vector-valued setting how we’d recover the impulse-response function from the moving-average representation. Put differently, what’s the matrix analog of \mathrm{StD}(\epsilon_t) = \sigma?

Let’s apply the want operator. This mystery matrix, let’s call it \mathbf{C}_x^{-1}, has to have two distinct properties. First, it’s got to rescale the vector of shocks, {\boldsymbol \epsilon}_t, into something that has a unit norm,

(16)   \begin{align*} 1 &= \mathrm{E}\left[ \, \Vert \mathbf{C}_x^{-1} {\boldsymbol \epsilon}_t \Vert_2 \, \right], \end{align*}

in the same way that \mathrm{StD}(\sfrac{\epsilon_t}{\sigma}) = 1 in the analysis above. This is why I’m writing the mystery matrix as \mathbf{C}_x^{-1} rather than just \mathbf{C}_x. Second, the matrix has to account for the fact that the shocks, \epsilon_{x,t} and \epsilon_{y,t}, are correlated, so that 1{\scriptstyle \%} point shocks to the inflation rate are always accompanied by 0.13{\scriptstyle \%} point shocks to the house-price appreciation rate. Because the shocks to each variable might have different standard deviations, for instance, \mathrm{StD}(\epsilon_{x,t}) = \sigma_x \approx 0.47{\scriptstyle \%} while \mathrm{StD}(\epsilon_{y,t}) = \sigma_y \approx 2.29{\scriptstyle \%}, the effect of a 1\sigma_x shock to the inflation rate on the house-price appreciation rate, 0.13 \times 0.47 \approx 0.06{\scriptstyle \%}, will be different than the effect of a 1\sigma_y shock to the house-price appreciation rate on the inflation rate, 0.13 \times 2.29 \approx 0.30{\scriptstyle \%}. Thus, each variable in the vector \mathbf{z}_t will have its own impulse-response function. This is why I write the mystery matrix as \mathbf{C}_x^{-1} rather than \mathbf{C}^{-1}.

It turns out that, if we pick \mathbf{C}_x to be the Cholesky decomposition of \mathbf{\Sigma},

(17)   \begin{align*} \mathbf{\Sigma} &= \mathbf{C}_x\mathbf{C}_x^\top, \end{align*}

then \mathbf{C}_x^{-1} will have both of the properties we want as pointed out in Sims (1980). The simple 2-dimensional case is really useful for understanding why. To start with, let’s write out the variance-covariance matrix of the shocks, \mathbf{\Sigma}, as follows,

(18)   \begin{align*} \mathbf{\Sigma} &= \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y \\ \rho \cdot \sigma_x \cdot \sigma_y & \sigma_y^2 \end{pmatrix} \end{align*}

where \rho = \mathrm{Cor}(\epsilon_{x,t},\epsilon_{y,t}). The Cholesky decomposition of \mathbf{\Sigma} can then be solved by hand:

(19)   \begin{align*} \overbrace{ \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y \\ \rho \cdot \sigma_x \cdot \sigma_y & \sigma_y^2 \end{pmatrix}}^{=\mathbf{\Sigma}} &= \begin{pmatrix} \sigma_x^2 & \rho \cdot \sigma_x \cdot \sigma_y  \\ \rho \cdot \sigma_x \cdot \sigma_y & \rho^2 \cdot \sigma_y^2 + (1 - \rho^2) \cdot \sigma_y^2 \end{pmatrix} \\ &= \underbrace{ \begin{pmatrix} \sigma_x & 0 \\ \rho \cdot \sigma_y & \sqrt{(1 - \rho^2) \cdot \sigma_y^2} \end{pmatrix}}_{=\mathbf{C}_x} \underbrace{ \begin{pmatrix} \sigma_x & \rho \cdot \sigma_y \\ 0 & \sqrt{(1 - \rho^2) \cdot \sigma_y^2} \end{pmatrix}}_{=\mathbf{C}_x^{\top}} \end{align*}

Since we’re only working with a (2 \times 2)-dimensional matrix, we can also solve for \mathbf{C}_x^{-1} by hand:

(20)   \begin{align*} \mathbf{C}_x^{-1}  &= \frac{1}{(1 - \rho^2)^{\sfrac{1}{2}} \cdot \sigma_x \cdot \sigma_y} \cdot \begin{pmatrix} (1 - \rho^2)^{\sfrac{1}{2}} \cdot \sigma_y & 0 \\ - \rho \cdot \sigma_y & \sigma_x \end{pmatrix} \\ &=  \begin{pmatrix} \frac{1}{\sigma_x} & 0 \\ - \frac{\rho}{(1 - \rho^2)^{\sfrac{1}{2}}} \cdot \frac{1}{\sigma_x} & \frac{1}{(1 - \rho^2)^{\sfrac{1}{2}}} \cdot \frac{1}{\sigma_y} \end{pmatrix} \end{align*}

So, for example, if there is a pair of shocks, {\boldsymbol \epsilon}_t = \begin{bmatrix} \sigma_x & 0 \end{bmatrix}^{\top}, then \mathbf{C}_x^{-1} will convert this shock into:

(21)   \begin{align*} \mathbf{C}_x^{-1}{\boldsymbol \epsilon}_t = \begin{bmatrix} 1 & - \rho \cdot (1 - \rho^2)^{-\sfrac{1}{2}} \end{bmatrix}^{\top}. \end{align*}

In other words, the matrix \mathbf{C}_x^{-1} rescales {\boldsymbol \epsilon}_t to have unit norm, \mathrm{E}\!\left[ \, \mathbf{C}_x^{-1} \, {\boldsymbol \epsilon}_t^{\phantom{\top}}\!{\boldsymbol \epsilon}_t^{\top} (\mathbf{C}_x^{-1})^{\top} \right] = \mathbf{I}, and rotates the vector to account for the correlation between \epsilon_{x,t} and \epsilon_{y,t}. To appreciate how the rotation takes into account the positive correlation between \epsilon_{x,t} and \epsilon_{y,t}, notice that matrix \mathbf{C}_x^{-1} turns the shock {\boldsymbol \epsilon}_t = \begin{bmatrix} \sigma_x & 0 \end{bmatrix}^{\top} into a vector that is pointing 1 standard deviation in the x direction and - \rho \cdot (1 - \rho^2)^{-\sfrac{1}{2}} in the y direction. That is, given that you’ve observed a positive 1\sigma_x shock, observing a 0\sigma_y shock would be a surprisingly low result.

If we plug \mathbf{C}_x^{-1} into our moving-average representation of \mathbf{z}_t, then we get the expression below,

(22)   \begin{align*} \mathbf{z}_t  &= \sum_{\ell = 0}^{\infty} \mathbf{\Gamma}^{\ell} (\mathbf{C}_x \mathbf{C}_x^{-1}) \, {\boldsymbol \epsilon}_{t-\ell}, \\ &= \sum_{\ell = 0}^{\infty} \mathrm{Imp}_x(\ell) \, \mathbf{C}_x^{-1} {\boldsymbol \epsilon}_{t-\ell}, \end{align*}

implying that the impulse-response function for x_t is given by:

(23)   \begin{align*} \mathrm{Imp}_x(h) &= \mathbf{\Gamma}^h \mathbf{C}_x. \end{align*}

The figure below plots the impulse-response function for both x_t and y_t implied by a unit shock to x_t using the coefficient matrix from Equation (2).


Bias in Time-Series Regressions


1. Motivation How persistent has IBM's daily trading volume been over the last month? How persistent have Apple's monthly stock returns been over the last $5$ years of trading? What about the US's annual GDP growth over the last century? To answer … [Continue reading]

Why Not Fourier Methods?


1. Motivation There are many ways that you might measure the typical horizon of a stock's demand shocks. For instance, Fourier methods might at first appear to be a promising approach, but first impressions can be deceiving. Here's why: spikes in … [Continue reading]