Friday, March 11, 2011

The Causal Markov Axiom

In his amazing essay, How to Make Our Ideas Clear, written for Popular Science (of all places!), the polymath C.S. Peirce describes three stages of clarity for concepts:

1. Ability to recognize or apply a concept;
2. Ability to abstractly define a concept;
3. Ability to identify the practical consequences of a concept.

The third is the highest grade of conceptual clarity and forms the basis for the philosophical position called pragmatism. Peirce writes:

Thus, we come down to what is tangible and conceivably practical, as the root of every real distinction of thought, no matter how subtile it may be; and there is no distinction of meaning so fine as to consist in anything but a possible difference of practice. ...

It appears, then, that the rule for attaining the third grade of clearness of apprehension is as follows: Consider what effects, that might conceivably have practical bearings, we conceive the object of our conception to have. Then, our conception of these effects is the whole of our conception of the object. [Emphasis added.]

The pragmatist looks at what things do -- at how they act upon us. The pragmatist says that if you want to know what a thing is, you should find out precisely what it does. Moreover, when you have figured out what a thing does, then there is nothing more to learn about what it is.

Applying this line of thinking to causation, the pragmatist would say that we should figure out what practical effects or consequences causal relations have. Having done that, we will have exhausted our conception of the causal relation.

The Causal Markov Axiom. One set of practical consequences of causal structure is described by the causal Markov axiom. And thus, the causal Markov axiom turns out to have interesting applications to the project of discovering causal structure. For present purposes, causal structure refers to a causal graph, which is a directed graph that represents the direct (unmediated) causes relative to some set of random variables. I will talk about "the" causal Markov axiom despite the fact that there are several alternative formulations that do not agree in all cases.

The causal Markov axiom relates a (joint) probability distribution and a causal graph as follows.

If G is a causal graph over a set of variables V (treated as vertices) and if P is a joint probability distribution over V, then according to the distribution P:

For disjoint sets A, B, and S of vertices (variables) in G, if A and B are separated by S in the moral graph of the subgraph induced by the smallest ancestral set containing the union of A, B, and S, then A is independent of B conditional on S.

Terminology. If you are already familiar with the formalism behind the causal Markov axiom, feel free to skip ahead. If you aren't familiar, here are some definitions that you probably want in order to understand the condition.

Let A and B be vertices in the graph G. A set S of vertices in G is an A, B separator iff all paths from A to B include some vertex in S. Two sets A and B are separated by a set S iff for every pair <A, B> of vertices, where A is an element of A and B is an element of B, S is an A, B separator.

The parents of a vertex B are the vertices A such that AB. That is, the parents of a vertex are its direct causes in the graph. The children of a vertex are defined similarly. The ancestors of a vertex B are the vertices that are parents of B or the parents of parents of B or ... and so on.


In the figure, the parents of B are the blue vertices, and the green vertices are its non-parental ancestors.

In a directed graph (either acyclic or cyclic), a set A is called an ancestral set iff for each vertex A in A, all of the ancestors of A are contained in A. The causal Markov condition as stated above also holds for chain graphs, which include undirected edges. However, chain graphs introduce complications that I don't want to deal with in a blog post. If you are interested, see Lauritzen's book on graphical models and this paper by Richardson and Lauritzen.

Let G be an arbitrary graph. The operation of marrying two vertices consists in connecting two vertices with an undirected edge. The moral graph of G is the completely undirected graph obtained by first marrying every pair of vertices that share a child in common, then removing all arrowheads from the graph, and finally identifying all multiple edges in the graph (i.e. edges that directly connect the same two vertices).

The moral graph of the graph in the figure above is:


Conveniently, the moral graph in the figure above is the moral graph of the smallest ancestral set containing the vertex B. Hence, if you want to know the conditional independence relations entailed by the causal Markov axiom to hold between B and any other vertex in the first graph above, look for separating sets in the second graph. Just to take one obvious example, the blue vertices separate B from the green vertices. So, {B} is independent of the set of green vertices conditional on the set of blue vertices.

For your edification: unmarried parents in a graph are also called unshielded colliders, immoralities, or v-structures, depending on the author.

Axiom or Property? The causal Markov axiom is sometimes called the causal Markov condition or the causal Markov property. I'm not sure that the people who use these different labels intend to give a different sense to the axiom. However, I think there are alternative ways to formulate the axiom that actually make "condition" or "property" more appropriate. For example, one might say that a causal graph and a probability distribution (as a pair) have the causal Markov property iff ... (The dots would be replaced by the complicated statement from earlier.)

As I see it, the causal Markov axiom is a bit of metaphysics; whereas, the definition of the causal Markov property is a bit of mathematics. The difference relates to my earlier discussion of truth in definitions. The causal Markov axiom makes an assertion about the relation between causal graphs and probability distributions; the definition of the causal Markov property does not. In particular, the definition of the causal Markov property does not assert that any graph-distribution pair actually satisfies the causal Markov property. As long as there are some causal graphs, the causal Markov axiom does.

Consequences of the CMA. The causal Markov axiom has two consequences that have names in the literature: robustness and the common cause (system) principle. My colleague Karen Zwier gave an interesting presentation on this point back in 2008.

In very rough form, robustness says that a variable is screened off from the rest of its ancestors by its parents. That is, a variable C is independent of the rest of its ancestors A, conditional on its parents B. We saw an example of this earlier. For a formal derivation of robustness from the causal Markov axiom, see this paper by Suarez.

In similarly rough form, the common cause (system) principle says that if two variables A and B are associated but neither variable causes the other, then there exists a collection of common causes C of A and B such that A and B are independent conditional on C. In his SEP article on Reichenbach's Common Cause Principle, Frank Arntzenius gives a proof that the common cause (system) principle is entailed by the causal Markov axiom. (See Section 1.2 and footnote number four in his article.)

Arntzenius appeals to a slightly different version of the causal Markov axiom than I have stated in this post, and the proof with the version I'm using is pretty straightforward, so I give it here.

Let A and B be two vertices in the causal graph G. Suppose that A and B are unconditionally associated but there is no directed path from A to B and no directed path from B to A. Take the contrapositive of the causal Markov axiom (as I've stated it): if A and B are unconditionally associated, then they must not be separated by the null set. So, the moral graph of the smallest ancestral set containing A and B must include at least one undirected path connecting A and B. Pick one such path. That path cannot contain a collider or at least one of the variables on the path would not be in the smallest ancestral set of A and B. But the directed edges that induce the path also cannot be uni-directional, by hypothesis. Hence, the path contains a vertex that is an ancestor of both A and B: aka, a common cause structure. The path was picked arbitrarily, so the same may be said for every path connecting A and B in the moral graph.

Now, consider the set C of all the vertices that are ancestors of both A and B. Every path in the moral graph of the subgraph induced by the smallest ancestral set containing A and B includes some vertex in C. Hence, A and B are separated by C, and by the causal Markov axiom, A and B are independent conditional on C.

2 comments:

  1. Interesting. Can you give a real world example of the Causal Markov Axiom? How would this be applicable to the "common man on the street"? What are the practical applications of this?

    ReplyDelete
  2. Q,

    That is a great question. Sorry it's taken me so long to reply to it. I wanted to give a decent answer, and I've been a little busy the last couple of days, anyway.

    I think there are three different ways that one might reply to your question. First, one might point to everyday examples. Second, one might point to explicit scientific applications. Third, one might point to methodological issues resolved by the causal Markov axiom (CMA). I'll give examples in order.

    1. Suppose you like to go to a local Indian restaurant, where you order either tandoori chicken or a vegetable dish. Twice after eating the chicken, you notice that you have mild stomach cramps and feel sluggish the next day. You can't remember feeling ill after eating a vegetable dish. You guess that eating chicken at that restaurant causes the problem, but how can you be sure? The causal Markov axiom recommends a procedure: plan to go to the restaurant several times. Each time, flip a coin to determine whether you will have chicken or vegetables. Record whether you feel ill the next day. If whenever you eat the chicken, you feel ill, and if you do not feel ill after eating the vegetables, then you may conclude from the CMA that the chicken causes the illness. (You can think of a similar procedure at the level of restaurants if you think that you get sick every time you eat there.)

    Actually, the causal connection doesn't need to be completely deterministic, as in the case described, but it simplifies things.

    2. The CMA has been explicitly applied to a number of different scientific problems. I applied the CMA in a blog post earlier this year discussing the relationship between poverty, education, and temperature. A couple years ago, I applied the CMA in a real paper trying to get at the causes and effects of pursuing philosophical training. And more recently, I applied it to a debate about ordinary judgments of intentional action. You can find applications of the CMA to agricultural and environmental research, epidemiology, drug studies, gene expression and micro-array data, minerology, economics, and on and on. Usually, the CMA is most useful where the data are noisy and the theory is sparse.

    Since causation, as opposed to statistical association, tells us something about the results of proposed actions, correct causal inferences are important for making good policy decisions.

    3. For my money, the most important applications of the CMA are to fundamental methodological issues, like (a) what justifies causal inferences from experiments and (b) what experimental designs are most optimal for discovering causal structure? On these issues, I refer you to Frederick Eberhardt.

    ReplyDelete