Spurious correlations

Almost every day, we learn of some discovery that a disease is “linked with” some behavior or some chemical; or that some national trend is “linked with” another phenomenon.  We might ask, “What does ‘linked with’ really mean?”

The words “linked with” mean that two things are correlated, and in everyday usage, this concept covers a vast range of relationships.  Correlation is a legitimate statistical term, having numerical values: by overlaying different sets of data, it is possible to calculate a correlation coefficient.  Whether there is any meaning to that number, or any actual cause-and-effect relationship, is an entirely separate matter.

Some connections are obvious, such as the linkage between drunk driving and auto accidents, or between smoking and lung cancer.  The cause and effect are quite clear.  But many other connections may or may not be real.

There are an amazing number of things that are correlated with each other and yet have absolutely no connection whatsoever.  Here is a website where a whole lot of them are compiled.

This website was made by a grad student at Harvard, who wrote a computer program to grab statistics from here and there and compute how well different things match up.  Even though there is no real connection at all, the trend-lines in certain pairs of data sets go together surprisingly well.  In some cases there was an extremely high correlation coefficient, way up around 99%.

Here are two examples:

The age of Miss America correlates well with the number of murders by steam and hot objects:

Can you find a “cause and effect” relationship here?  As fishing becomes safer over time, Kentuckians seem to get married less.

This website goes on and on, with many more absurd examples.  You’d be amazed at what correlates with Nicholas Cage movies!

This website is an enjoyable place to visit around April Fools Day.  However, there is a serious point to be made: just because two things match up, it doesn’t mean that one causes the other.  The entire point of carrying out those computational exercises was to illustrate the old saying that “correlation is not causation.” 

When somebody comes on TV with an exciting announcement of some new linkage, think twice about whether or not the connection is plausible.

Almost every day, we learn of some discovery that a disease is “linked with” some behavior or some chemical; or that some national trend is “linked with” another phenomenon.  We might ask, “What does ‘linked with’ really mean?”

The words “linked with” mean that two things are correlated, and in everyday usage, this concept covers a vast range of relationships.  Correlation is a legitimate statistical term, having numerical values: by overlaying different sets of data, it is possible to calculate a correlation coefficient.  Whether there is any meaning to that number, or any actual cause-and-effect relationship, is an entirely separate matter.

Some connections are obvious, such as the linkage between drunk driving and auto accidents, or between smoking and lung cancer.  The cause and effect are quite clear.  But many other connections may or may not be real.

There are an amazing number of things that are correlated with each other and yet have absolutely no connection whatsoever.  Here is a website where a whole lot of them are compiled.

This website was made by a grad student at Harvard, who wrote a computer program to grab statistics from here and there and compute how well different things match up.  Even though there is no real connection at all, the trend-lines in certain pairs of data sets go together surprisingly well.  In some cases there was an extremely high correlation coefficient, way up around 99%.

Here are two examples:

The age of Miss America correlates well with the number of murders by steam and hot objects:

Can you find a “cause and effect” relationship here?  As fishing becomes safer over time, Kentuckians seem to get married less.

This website goes on and on, with many more absurd examples.  You’d be amazed at what correlates with Nicholas Cage movies!

This website is an enjoyable place to visit around April Fools Day.  However, there is a serious point to be made: just because two things match up, it doesn’t mean that one causes the other.  The entire point of carrying out those computational exercises was to illustrate the old saying that “correlation is not causation.” 

When somebody comes on TV with an exciting announcement of some new linkage, think twice about whether or not the connection is plausible.