# Fun with Statistics and the Gun Control Debate

Gun control advocates often contend that more gun deaths occur when guns are abundant in society.  The statistics they use to make the case are not persuasive.

The most commonly cited evidence for the linkage is that the states with higher gun death rates also tend to be the states in which gun owners are a higher percentage of the population.

The scatter plot in Figure 1 compares the gun death rate to the gun ownership rate.  One can readily see that states with lots of gun-owners also tend to have high gun death rates.  It is not a perfect relationship, but any objective observer will accept that the scatter plot suggests a linkage.

The problem is that gun control advocates often treat this scatter-plot as prima facie evidence when in fact it suggests only a possible relationship.  Because this statistical "science" appears to support their theory, gun control advocates rarely care to look into the matter any deeper.  This is a subtle form of confirmation bias.

The scatter plot in Figure 1 is based on the most commonly used data (CDC and Kalesan et al).  It is an accurate representation of what is referred to in statistics as linear regression.  On the vertical axis is plotted the phenomenon we are trying to explain — the gun death rate.  By convention, the horizontal axis is used to plot the theorized cause of variations in the gun death rate — in this instance, the gun ownership rate.  Each point represents an individual state.  The straight line running through the points (the regression line) seeks the middle and summarizes the average amount of change in the gun death rate associated with a given amount of change in the gun ownership rate.

There are all sorts of problems with the accuracy and the relevance of the data used to construct this scatter-plot.  A serious evaluation of the possible relationship between gun deaths and gun availability would call for better data, but since the objective in this essay is only to reveal some pitfalls associated with conclusions based on regression and correlation there is good reason to use the same data sources as those who draw faulty conclusions from them.

Statisticians rely on a computed number known as the correlation coefficient to measure the degree to which the plotted points line up on the regression line.  If all plotted points fall on the line, the correlation coefficient will be one.  If the points are randomly scattered, then it will approach zero.  Virtually any scatter plot that involves human behavior rather than physical processes will fail to approach either extreme.

The scatter-plot presented in Figure 1 has a computed correlation coefficient of 0.697.  This is over two thirds of the way from zero to a perfect one and suggests a strong relationship between gun deaths and gun ownership.

But sometimes a lot can be learned by comparing the correlation coefficients for two different relationships.  Let's look at data for vehicular death rates and vehicle ownership rates.  For both gun deaths and highway deaths, we might expect that a higher prevalence of the device involved (guns or cars) will contribute to the likelihood of the event occurring.  Once again, the data are taken from the most commonly used sources (IIHS and Wikipedia).

Figure 2 shows the scatter-plot and the correlation coefficient for both the gun death question and the highway death question.  The correlation coefficient for explaining gun deaths is much higher than that for explaining vehicle related deaths: 0.697 vs. 0.355.  The relationship between gun deaths and gun ownership is much stronger than the relationship between car deaths and car ownership.  The "guns cause gun deaths" theory continues to hold up, whereas the "cars cause car deaths" theory looks weak.  The two theories rely on comparable logic, but the statistics seem to give more credence to the "guns cause gun deaths" theory.

But now let's attempt what for a social scientist may be the closest thing to a double-blind control.  Let's test the anti-theories.  Let's use the same methodology to see whether car ownership causes gun deaths and whether gun ownership causes car deaths.  Both propositions defy logic, so we would expect their correlation coefficients to approach the zero value.  If they do not, then correlation may be less useful than expected, since it would fail to distinguish between reasonable and unreasonable indicators of causality.

Figure 3 shows the results.

For the proposition that car ownership causes gun deaths, the correlation coefficient is 0.349.  For the idea that gun ownership causes car deaths, it is 0.625.  All of a sudden, things are getting confusing.  Can we really believe that gun ownership is almost as good an explanation for car deaths as it is for gun deaths (0.625 vs. 0.697)?  Can we honestly contend that car ownership is as good (but not very good) at explaining gun deaths as it is at explaining car deaths (0.349 vs. 0.355)?

Finally, here is the ultimate absurdity.  Figure 4 provides the scatter plot and correlation coefficient for the silly idea that car deaths cause gun deaths.  The correlation coefficient of 0.788 (!) is higher than all the others we have seen.  Shall we conclude that car deaths do more to explain gun deaths than gun ownership does?  To do otherwise would be to value intuition and perception over statistical measurements — not exactly a scientific approach to the matter.

Maybe some as yet undiscovered factor accounts for variation in both gun deaths and vehicle deaths.  Perhaps more likely is that the data are seriously flawed.  In any event, the simple statistical approach so commonly used to ascertain the "cause" of gun deaths is inadequate.

Before we use statistics to address a problem of this sort, we must become knowledgeable about guns and data and how to interpret correlation and regression.  Before we buy into "causality," we need to know the dangers of accepting the superficially self-evident.

Gun control advocates often contend that more gun deaths occur when guns are abundant in society.  The statistics they use to make the case are not persuasive.

The most commonly cited evidence for the linkage is that the states with higher gun death rates also tend to be the states in which gun owners are a higher percentage of the population.

The scatter plot in Figure 1 compares the gun death rate to the gun ownership rate.  One can readily see that states with lots of gun-owners also tend to have high gun death rates.  It is not a perfect relationship, but any objective observer will accept that the scatter plot suggests a linkage.

The problem is that gun control advocates often treat this scatter-plot as prima facie evidence when in fact it suggests only a possible relationship.  Because this statistical "science" appears to support their theory, gun control advocates rarely care to look into the matter any deeper.  This is a subtle form of confirmation bias.

The scatter plot in Figure 1 is based on the most commonly used data (CDC and Kalesan et al).  It is an accurate representation of what is referred to in statistics as linear regression.  On the vertical axis is plotted the phenomenon we are trying to explain — the gun death rate.  By convention, the horizontal axis is used to plot the theorized cause of variations in the gun death rate — in this instance, the gun ownership rate.  Each point represents an individual state.  The straight line running through the points (the regression line) seeks the middle and summarizes the average amount of change in the gun death rate associated with a given amount of change in the gun ownership rate.

There are all sorts of problems with the accuracy and the relevance of the data used to construct this scatter-plot.  A serious evaluation of the possible relationship between gun deaths and gun availability would call for better data, but since the objective in this essay is only to reveal some pitfalls associated with conclusions based on regression and correlation there is good reason to use the same data sources as those who draw faulty conclusions from them.

Statisticians rely on a computed number known as the correlation coefficient to measure the degree to which the plotted points line up on the regression line.  If all plotted points fall on the line, the correlation coefficient will be one.  If the points are randomly scattered, then it will approach zero.  Virtually any scatter plot that involves human behavior rather than physical processes will fail to approach either extreme.

The scatter-plot presented in Figure 1 has a computed correlation coefficient of 0.697.  This is over two thirds of the way from zero to a perfect one and suggests a strong relationship between gun deaths and gun ownership.

But sometimes a lot can be learned by comparing the correlation coefficients for two different relationships.  Let's look at data for vehicular death rates and vehicle ownership rates.  For both gun deaths and highway deaths, we might expect that a higher prevalence of the device involved (guns or cars) will contribute to the likelihood of the event occurring.  Once again, the data are taken from the most commonly used sources (IIHS and Wikipedia).

Figure 2 shows the scatter-plot and the correlation coefficient for both the gun death question and the highway death question.  The correlation coefficient for explaining gun deaths is much higher than that for explaining vehicle related deaths: 0.697 vs. 0.355.  The relationship between gun deaths and gun ownership is much stronger than the relationship between car deaths and car ownership.  The "guns cause gun deaths" theory continues to hold up, whereas the "cars cause car deaths" theory looks weak.  The two theories rely on comparable logic, but the statistics seem to give more credence to the "guns cause gun deaths" theory.

But now let's attempt what for a social scientist may be the closest thing to a double-blind control.  Let's test the anti-theories.  Let's use the same methodology to see whether car ownership causes gun deaths and whether gun ownership causes car deaths.  Both propositions defy logic, so we would expect their correlation coefficients to approach the zero value.  If they do not, then correlation may be less useful than expected, since it would fail to distinguish between reasonable and unreasonable indicators of causality.

Figure 3 shows the results.

For the proposition that car ownership causes gun deaths, the correlation coefficient is 0.349.  For the idea that gun ownership causes car deaths, it is 0.625.  All of a sudden, things are getting confusing.  Can we really believe that gun ownership is almost as good an explanation for car deaths as it is for gun deaths (0.625 vs. 0.697)?  Can we honestly contend that car ownership is as good (but not very good) at explaining gun deaths as it is at explaining car deaths (0.349 vs. 0.355)?

Finally, here is the ultimate absurdity.  Figure 4 provides the scatter plot and correlation coefficient for the silly idea that car deaths cause gun deaths.  The correlation coefficient of 0.788 (!) is higher than all the others we have seen.  Shall we conclude that car deaths do more to explain gun deaths than gun ownership does?  To do otherwise would be to value intuition and perception over statistical measurements — not exactly a scientific approach to the matter.

Maybe some as yet undiscovered factor accounts for variation in both gun deaths and vehicle deaths.  Perhaps more likely is that the data are seriously flawed.  In any event, the simple statistical approach so commonly used to ascertain the "cause" of gun deaths is inadequate.

Before we use statistics to address a problem of this sort, we must become knowledgeable about guns and data and how to interpret correlation and regression.  Before we buy into "causality," we need to know the dangers of accepting the superficially self-evident.