I show how to use the front door criterion rather than an instrumental variables approach to identify causal effects in non-experimental settings.
Every econometrician is familiar with the experimental ideal2: in order to test a hypothesis a scientist should collect a large group of identical subjects, split them into groups, administer a treatment to only of the groups, and then quantify the difference in outcomes between the groups. For instance, consider an application of this experimental design to test the hypothesis that speculative traders destabilize asset prices:
Example (Price Impact of Speculative Trading): Does speculative trading destabilize prices? To answer this question in an idealized world, I would execute the following experiment. First, I would find identical asset markets for which I knew the true value of the asset. Then, in of these markets, I would air-drop speculative traders into the market. This half of the markets would be the treatment group, while the remaining half of the markets would be the control group.
After allowing the speculators to trade, I would then compare the average volatility of the asset prices centered around their true values in each of the groups. If the asset prices in the treatment group were measurably farther from their fundamental value, I would conclude that the introduction of speculators into an asset market causes mispricing.
Although this is a nice benchmark, econometricians generally don’t have the luxury of executing this idealized experiment design due to physical, financial or ethical constraints.3 The setting above often breaks down leaving the analysts to account for confounding effects like dissimilar treatment and control groups.
For instance, if the treatment and control groups are observably different, an econometrician can include control variables in a regression framework in order to account for this variation. If the treatment and control groups are unobservably different, i.e. there is endogenous treatment selection, the problem is more challenging. Here, the standard approach in the economics literature has been to use an instrumental variables approach whereby, as an econometrician, I would look for an instrumental variable which co-varies with treatment assignment but not with any confounding effects.
In this post, I provide an alternative route to identifying a causal effect in the presence of unobservable variation in the treatment and control groups called the front door criterion. This approach was introduced by Judea Pearl in the mid-1990’s4. Rather than focusing on exogenous variation in treatment selection, this approach exploits exogenous variation in the strength of the treatment. Thus, even if agents endogenously selected their broad treatment groups, if there is exogenous within group variation in how intensely each agent was treated, we can use this variation to identify the causal effect of treatment.
I proceed as follows. First, to build intuition for the approach, in the next section I give examples of how the front door criterion has been used in the economics literature without being explicitly named. I also use these examples to formally define the causal inference problem facing the econometrician in the language of directed graphs outlined in Pearl (2000).
Then, in section 3 I define the front door criterion approach to causal inference explicitly and show how it applies to the problem of identifying the price impact of introducing speculative traders to a market. I also illustrate how to implement this identification strategy using an OLS regression framework
2. Terminology and Examples
In this section I illustrate how the front door criterion is fundamentally different from the instrumental variables approach and also introduce some additional terminology in the context of examples.
An Introductory Example
The front door criterion has been used without a name in the economics literature since at least the early 1990’s in the form of Blanchard, Katz, Hall and Eichengreen (1992)‘s work on macro-laboreconomics. Cohen and Malloy (2010) execute one of the cleanest quasi-experiments using this approach. These authors are in interested in the effect of social ties on congressional voting outcomes. In the example below, I outline their experimental design and discuss their results:
Example (Social Ties and Congressional Voting): Do social ties between U.S. senators affect their voting behavior? For example, when a senator from another state has an important bill to pass, are congressmen who attended the same college more likely to vote in favor of the bill? Cohen and Malloy (2010) find this exact result.
More precisely, that authors find that the of senators in congressman ‘s alumni network that vote for a given bill predicts congressman ‘s voting behavior on the bill. So, for example, consider the voting behavior of senator who attended Harvard. The result reads that senator is more likely to vote yes on the bill when of the other congressmen who have a tie to Harvard vote “yes” on a bill relative to when only do.
What’s more, this social network effect on congressman ‘s voting decisions is increasing in the strength of network. So, for instance, senators that attended the same school at the same time have more correlated votes than senators that just went to the same school. Finally, the results seem to be robust to school, ideology, time period and senator fixed effects.
The trouble with interpreting the results above is that school choice is not an exogenous variable. For instance, students who chose to go to UC Berkeley in the 1960’s were very different that those that chose to go to the University of Alabama during the same period. It could be that some omitted variable related to each senator’s upbringing is driving both school choice and voting decisions. In order to solve this problem, Cohen and Malloy (2010) use the front door criterion by exploiting the fact that social ties between congressmen work through discussion on the floor of the senate chamber (shown below).
Example (Social Ties and Congressional Voting, Ctd…): In order to get a few extra votes for an important bill, a congressmen might turn to the people sitting just to his left and right and try to convince them to vote his way. This coercion becomes increasingly hard the longer a senator must travel to get in a few quiet words with his colleague. Importantly, the seating is strictly assigned with senior senators getting the best seats and rookie senators getting whatever seats are left over. Thus, the seating of rookie senators is randomized.
Cohen and Malloy (2010) find that within the group of rookie senators the congressmen who were randomly assigned a seat which was closer to school mates had more correlated voting outcomes.
The key idea embedded in this example is that the Cohen and Malloy (2010) exploited exogenous variation in the treatment intensity rather than group assignment. If the authors had followed an instrumental variables approach and looked for exogenous variation in subject group assignment, they would have needed to find an instrument that randomly assigned future senators to different colleges at the age of –a near impossible task. The authors instead identify the pathway through which the social network treatment effect travels (i.e., quiet discussions in the congressional chamber during senate recesses.) and look for exogenous variation in how wide this channel is.
The identification comes from the following comparison. Consider a situation in which rookie senators, and , each went to the same school, Faber College, but senator gets randomly assigned a seat next to other Faber College grads while senator is isolated from any Faber alumni. Even though both senators selected into the same school treatment group, only senator has a valid mechanism through which the social network treatment effect can travel. Intuitively, the within school vote correlation experienced by senator should be due only to background effects, while the correlation experienced by senator should be due to both background effects and the social network treatment.
In order to make the above intuition a more concrete, I need to introduce some new notation from Pearl (2000). My goal is to be able to extend the natural intuition in the example above to more complicated settings.
Cohen and Malloy (2010) critically rely on defining the precise channel through which the treatment effect travels. If I want to extend their intuition, I first need a more precise definition of what constitutes a causal model (…and thus a causal chanel).
Definition (Causal Model): A causal model is a triple where:
- is a vector of variables that are determined by factors outside the model,
- is a set of variables that are endogenously determined within the model, and
- is a set of functions
such that the entire set of forms an acyclical mapping from to .
For simplicity, I will denote the -algebra of all realizations of as and the associated probability space as . So, for example, is the probability of observing a particular realization :
This definition has nice features. First, as Pearl (2000) emphasizes, it allows for the representation of causal models as graphs where each node is a random variable and directed edge is an affect. For instance, in panel of the figure below, I show how to graph a causal chain in which the outcome variable is affected by an endogenous variable which is in turn affected by an exogenous variable that has no direct effect on . In panel of the figure, I show a graph of a causal model that is not well defined. Here, there are no exogenous variables on which to stand. The values of are determined by the values of , which are determined by the values of , which are in turn determined by the values of . Finally, in panel of the figure, I show a graph of a causal model which would admit identification via an instrumental variables approach. Here, even though has a direct effect on , there exists a confounding variable which jointly determines both and . The instrumental variables approach suggests that an analyst use the exogenous variation in predicted by to circumvent the effect of .
Second, this definition explicitly models the channels through which different variables interact in the form of the function . This allows us to think about, not just adjusting which nodes are connected in the graphs above, but also the strength and nature of the connection.
With definition in hand, I now need to explicitly define how to quantify a causal effect:
Definition (Causal Effect): Let be a causal model, be a particular variable of the causal model , and be a particular realization of this variable. Then, the effect of taking the action rather than can be written as a distance metric over the elements of the probability space :
where is a collection of events in the -algebra, and is defined as,
Perhaps the best way to parse this definition is to walk through a few examples. First, I consider how to represent an average treatment effect using this definition. This effect is the standard estimator used throughout much of the economics literature as well as in other fields such as pharmaceutical testing. Below I walk through this implementation:
Example (Average Treatment Effect): Consider a setting where is just the conditional expectation operator for variable , i.e.:
Under these conditions, if we let be the entire -algebra , we get the standard average treatment effect:
Heckman and Vytlacil (2005) argue that analysts should think hard about whether or not the commonly used estimator is appropriate for their purposes. As an alternative, they suggest estimating marginal treatment effects:
Example (Marginal Treatment Effect): Consider a setting where is still the conditional expectation operator, but restrict to be the subsets of the event space over which agents would be indifferent between the treatment and control assignments:
We can interpret as the mean gain change in outcomes for subjects who would be indifferent between treatment or not. Finally, consider transforming the rough explanation of the causal effect given in the introductory example into more formal language:
Example (Price Impact of Speculative Trading, Ctd…): Consider the example in the introduction concerning the potential destabilizing effects of speculative trading. The effect outlined in this introductory example can be written more formally as follows. Suppose that is the deviation of the observed price from the fundamental value in a market and the variable is binary representing the existence or absence of speculators in a market. Define as:
Then the estimator described heuristically in the introductory example would be given by where is the entire -algebra .
An Additional Example
Finally, with this new terminology in hand, I want to visit a second more complicated application of the front door criterion by Blanchard, Katz, Hall and Eichengreen (1992) to study the hypothesis that, at the state level, an increase in the immigration rate decreases the unemployment rate. Below, I describe a stylized version of their results:
Example (Immigration Choice and the Unemployment Rate): States with the lowest levels of unemployment tend to enjoy the largest immigrant populations. Is this relationship causal? In this example, I consider the alternative hypothesis that a increase in a state’s immigrant population makes its economy more efficient and lowers its unemployment rate against the null hypothesis that some omitted variable co-determines both a state’s immigrant population percentage and its unemployment rate. For instance, immigrants might rationally choose to move to the states with the highest labor demand.
The key insight to identifying the causal effect of a state’s population make up on its unemployment rate is to pin down the mechanism through which this effect flows. Specifically, observe that, if changes in a state’s immigrant population have a causal effect on its unemployment rate, then this effect should be more pronounced in states where immigrant form a larger fraction of the population to start with. e.g., if there is a causal link, a change in the immigrant population of California ought to have a larger effect than the same in Iowa; whereas, if there is no causal link and some omitted variable is co-determining both the immigrant population percentage and the unemployment rate, this link to the absolute number of immigrants added need not exist. Indeed, this is roughly what Blanchard, Katz, Hall and Eichengreen (1992) as shown in figure 8 of the original paper.
Using the definition from above, we can now cast this identification strategy as instrumenting for an exogenous shift in the function which maps the immigrant population percentage to the unemployment rate. Let be the function in California and be the function in Iowa. Then, roughly speaking, I can write the causal estimator as:
In words, captures how much more the unemployment rate shifts when the change in a state’s immigrant population is large versus when it is small in the California regime as compared to the Iowa regime. Note that this is the exact same intuition as above in the social ties example.
3. The Front Door Criterion
In this section I link the front door criterion to a regression based estimation strategy.
Identifying Speculative Price Impact
Why is identifying the price impact of speculators hard? What is the basic inference problem here? Consider the following example, and ask yourself: “Is it right to conclude that speculators caused the massive run-up in prices?”
Example (Price Impact of Speculative Trading, Ctd…): Suppose that you look at the stock market, and you observe Amazon’s stock price sky-rocketing from around dollar at the beginning of the year to over dollars at the end of the year.
What’s more, after mulling it over, suppose that you also conduct a survey of all tech-stock traders and discover that a large fraction of them were speculators. When asked, they responded that they were buying the stock just to resell it later at a higher price and placed no weight on any dividend payment concerns. To many outside observers, this seems like an air-tight case that speculators are driving up the stock price of Amazon.com; however, it turns out that you can’t be so sure.
Speculators only show up when prices are out of whack. For instance, if I am a speculator making my living off of shorting over-priced assets, buying under-priced assets and riding waves of excessive prices changes I would never enter a market with correct prices. There would be no money to be made. As a speculator, I could well be having a stabilizing effect on prices. Thus, the core challenge is to determine whether or not speculators are showing up in response to mis-pricing or instead causing mis-pricing via their trading behavior. This is the argument that Milton Friedman put forth in his 1953 book, Essays on Positive Economics.
To make this intuition a bit more concrete, I now walk through a simple numerical example.5 My goal in this section is to lay out the simplest possible model in which it is feasible to study the inference problem above.
Consider a world containing with markets in which prices in each market are either correct or too high. There are no shades of grey and prices can never be too low. What’s more, suppose that speculators either abstain or enter the market. Thus, the price in market can be written as below where represent the counterfactual states of the world in which speculators either abstain or enter market :
This formulation allows me to ask questions about counterfactuals. Even though in empirically observed data, I will only ever see either or , I want an econometric framework in which I can think about both these observations. For instance, I am interested in questions like: “Suppose speculators entered market but not market . Would the prices in market have been the same as they are in market if no speculators had entered?”
From here, I can derive an OLS specification which maps this binary inference framework into a regression specification:
Proposition: (OLS Regression) Let be a random variable representing the price in an arbitrary market, be a random variable representing the existence of speculators in an arbitrary market, and let the superscripts denote the values of an economy which contains/does not contain speculators. Then an OLS regression has the components:
where and .
This proposition says that an OLS regression has an intercept which is the expected price in a market that has not been treated with speculators and a slope which is the change in the expected price in a market due to treatment with speculators.
However, the most helpful part of this proposition is actually the factorization of the pricing error into components. The first component, , is the difference between the observed prices in untreated markets and the expected price in an untreated market. This difference will be non- if, for example, markets with correct prices are less likely to attract speculators. Conversely, the second component, , depends on whether or not the treated markets are differentially more likely to by over-priced relative to their expected levels.
Proof: (OLS Regression) To derive the formulation above, start with the simple decomposition:
This decomposition is nice because it tells us exactly where the identification problem will show up in the naive OLS regression framework. If speculators are more likely to show up in markets with excessively high prices, we should expect to see a distorted term. The standard way to get around this problem is to use an instrumental variables approach. However, instruments are hard to come by. In the section below, I show how to use a new approach to identify the price impact of adding speculators to a market.
Regression Estimate of
The last section illustrated how a naive OLS regression specification will deliver biased estimates of the price impact of adding speculators to a financial market if the speculators endogenously choose which markets to enter. What’s more, the decomposition in Proposition 1 details exactly how this bias will manifest itself as a either a negative term or a positive term. In this section, I introduce a the front door criterion as a way to circumvent this identification problem and estimate this price impact.
Specifically, I show that the causal effect can be calculated as follows:
Proposition: (Causal Effect Estimator) The causal effect estimator using the front door criterion in a -state system with outcome variable , mechanism and explanatory variable can be written as:
In practical terms, what is this proposition saying? Well, suppose that I estimated . This proposition would then read that the price in market in a world where is points higher than the price in the exact same market in a counterfactual world where . Before I can explain the proposition in more detail, I need to define the variable :
Definition: (Mechanism) A variable is a mechanism relative to the ordered pair of variables if 1) only affects through , and 2) is independent of any confounding variables affecting both and .
is called a “mechanism” because all of the affect the explanatory variable on the outcome variable travels through . This is where the name front door criterion comes from as well. For example, I use exogenous variation in the amount of funds available to speculators as my mechanism where means that speculators either have little free cash or they have a ton of free cash. So, while speculators can still choose which markets to enter, they may or may not have sufficient funds to really affect the market equilibrium. Thus, acts like an instrument for the intensity of a (…perhaps endogenously selected…) treatment effect.
To make these ideas more tangible, consider the fake data table below to confirm the estimator computation. This table lays out the different states of the world that we can possibly observe with respect to the pair of variables as well as their relative frequency and the probability of observing over-pricing in each of these states.
Not every channel is a valid mechanism though. needs to have an additional property that an residual variation in not explained by variation in is uncorrelated with any confounding variables.6 Let me make this idea clearer via an example of how might violate this requirement. Consider some confounding variable that makes speculators enter a market and prices to over-heat. For instance, think of as a dummy variable for whether or not the New York Times wrote a news article about a company7 For to be an invalid mechanism, it would have to be the case that speculators tend to have unexpectedly more funds precisely when the New York Times is most likely to write an article about an industry; i.e., an invalid instrument would yield:
Having I’ve outlined the basic elements of the proposition, I now give an intuitive explanation and refer any interested readers to Pearl (2000):
Intuition: (Causal Effect Estimator) Consider the first line. This difference captures the increase in the likelihood of over-pricing (i.e.,) due to speculators entering a market (i.e., rather than ) when they have a lot of capital and should have a larger effect (i.e., ). Now, consider the second line. This difference captures the increase in the likelihood of over-pricing (i.e.,) due to speculators entering a market (i.e., rather than ) when they have do not have very much capital and should have a smaller effect (i.e., ).
Now, suppose that the correlation between the existence of speculators in market and over-pricing in market is purely spurious and due to some confounding variable that drives up prices and speculator demand at the same time. In this world, we should expect to see the differences in both lines be the same. Opening and closing the nozzle on an unconnected hose should have no effect on the amount of water coming out of it.
On the other hand, suppose that the effect is not entirely spurious. Then unexpectedly giving speculators more funds will cause prices to rise relatively more leading to an estimate of .
Note that this approach does not rule out the possibility that speculators are still endogenously choosing to invest in over-priced markets to some degree. For instance, consider a world where:
In this world, even where speculators can have little price impact, they are still good predictors of over-pricing. In this section, I show how to implement this estimator using a stage regression. This step is an immediate extension of the previous section and I give the main result below:
Proposition: (-Stage Regression) Consider a system of variables as outlined above. The following -stage regression procedure is an unbiased and consistent estimator of if is a valid mechanism.
This result follows directly from reading the first differencing in a -state framework as a reduced form projection.
This identification strategy is new, and as a result there are a lot of places where using the front door criterion might yield new results for tough econometric problems. The key advantage is the flexibility to randomize the intensity of the treatment rather than the treatment assignment as in the standard IV framework.
- I am currently working on a paper with Chris Mayer in which we use this identification strategy to parse the causal effect of introducing out of town second home buyers into a housing market on local house prices. In this setting we use relative city size rather than funding constraints as our mechanism. We find that air-dropping out of town speculators into a housing market causes house price appreciation. ↩
- See Angrist and Pischke (2010). ↩
- There are some settings such as development economics (e.g., See Banerjee and Duflo (2008).) where this natural experiment approach is feasible. However, for the majority of econometric questions this approach is difficult to implement. Much of the econometrics research is an effort to bridge this gap with various levels of success (e.g., see Lalonde (1986)). ↩
- See Pearl (1995) and Pearl (2000). ↩
- The analytical framework in this section comes from Ch. 3 in Morgan and Winship (2007) ↩
- This is an admittedly a very ragged statement; for a more detailed treatment of this idea, read through Ch. 3 of Pearl (2000). For brevity’s sake, I trimmed much of my original discussion on the nuts and bolts of graphical models of causality. This is the cleanest way of looking at causal inference in my opinion and I really recommend this text. ↩
- See Huberman and Rogev (2002) for a real world example in the pharmaceutical industry. ↩