The Conservative Case Against the Long-Form Census

The mandatory long-form census is -- and needs to continue to be -- a cause for action among conservatives and libertarians, particularly in the United States and Canada. The fight for the long-form census has typically come from the liberals, but lately even some conservatives are taking the bait. Strong arguments based on the resistance to Big Brother phenomena and social engineering efforts, and the support for other freedom-loving principles, mitigate against a mandatory long-form census. Although ideology is important, perhaps the strongest conservative case against these census forms involves their lack of scientific rigor.

Three options exist for the long-form census: (i) make it mandatory; (ii) make it voluntary; or (iii) scrap it altogether. While I prefer the third option, the mandatory versus voluntary options need to be addressed.

We know income surveys are more accurate when the survey is mandatory versus voluntary. This seems reasonable. Income is a variable that can be verified by auditing. That said, mandatory surveys with readily verifiable variables such as income can still yield significant errors, as evidenced by the problems in our tax collection systems. But the real concerns with the long-form census do not surround basic financial and demographic information that can be readily verified. Rather, the trouble involves the numerous questions that can never be practically verified.

In Canada, the long-form census (which was eliminated in 2010) included absurd questions such as how much housework you do, how often you interact with others, the condition of your flooring, and so on. The American Community Survey also includes a number of ridiculous inquiries, such as whether you have difficulty concentrating, remembering, or making decisions because of a physical, mental, or emotional condition, whether you have difficulty dressing or bathing, how much you think your residence is worth, how long it took you to get to work last week, and how much you spent on oil, coal, kerosene, wood, etc., for your residence over the past year (and if you haven't lived there for 12 months, you're allowed to just guess).

For these situations, one can imagine individuals being annoyed at filling out a long document, and subsequently start to provide nonsense for the non-verifiable and trivial questions. Income, sex, age, etc., are generally pieces of information most individuals have at their fingertips and can fill in quickly and/or which many individuals see the state as having a value in knowing. Flooring condition, housework details, etc., not only require more thought for many, but also invoke a 'who cares' or 'none of your business' response that can impact the accuracy of the information provided.

There are also problems with subjective category responses and awareness/assessment issues -- and even how to define a topic -- in various long-form census questions. What one person defines as housework may not match another individual's criteria. Defective plumbing to one person may not be defective to another, and what about individuals that have defective plumbing and don't know it? What about those that do not have defective plumbing but think they do? What one person defines as difficulty in concentrating, remembering, making decisions, dressing, or bathing is likely very different than another person's definition.

These types of issues have been previously discussed in the social sciences (as evidenced by the following quote from the text "Surveys in Social Research" by David de Vaus), but the long-form census proponents repeatedly avoid them.

"Voluntary participation, however, conflicts with the methodological principle of representative sampling. Given the choice, certain types of people (e.g. those with lower levels of education, from non-English-speaking backgrounds) are more likely than others to decline to participate in surveys and can result in biased samples. However, compulsory participation is not the solution. Although compulsion might minimize bias it will undermine the quality of the responses."

This is a very real problem in statistics. For verifiable variables, we can effectively sample the population using voluntary and mandatory surveys and then compare the results to fully audited investigations. But what about very personal details of people's lives where verification is impossible or impractical?

Consequently, most social scientists fail to address the looming elephant in the room. Much of the data collected by census agencies and their analogs in other government departments is unreliable for the simple reason that a substantial portion of this data is entirely unverifiable. Regardless of whether a survey is mandatory or voluntary, if you ask someone a question to which you cannot reasonably verify the answer, then you have no idea as to the accuracy of the data. Nobody appears to be rigorously accounting for these issues when using much of the government's data, or when the government decides to spend taxpayer money to acquire these types of flawed data.

It's a fantasyland of assuming the underlying data is valid, when it may not be, leading to
essentially junk data that census agencies and other government departments -- and almost all researchers in the social sciences and humanities -- have been generating for decades. When researchers
speak of "information-rich surveys" by census agencies, we must not get confused. There is a lot of data, but often little information as much of the data is unreliable.

Proponents of the long-form census often make the following type of comment:

"Evidence-based policy-making requires just that -- evidence -- standard, reliable metrics whose quantification and legitimacy is widely agreed upon. In their absence, policy-making at all levels and in every sector will be as expensive as it is hopeful, while policy actors are forced to gingerly 'guess and check' over time. In the absence of good data, our ability to fully comprehend complex policy issues will grow anecdotal and inconsistent."

Guess what? We've been making poke-and-pray policies for a long time using census datasets and other surveys because this so-called evidence is unreliable. It is most often just hearsay, and as such, it should be inadmissible in public policy formulation. An absence of data is better than a wealth of bad data.

Among all these calls for evidence-based policy making and how conservative administrations are purportedly anti-science, one actually finds the shoe is on the other foot, and that many of the so-called pro-science individuals are actually anti-science and/or pro-junk science. It is, in fact, the proliferation of junk evidence that we are finally climbing out of in some nations. Long-form census proponents are correct in stating that "in the absence of good data, our ability to fully comprehend complex policy issues will grow anecdotal and inconsistent." That has been the situation for some time (i.e., we've had bad data that many have claimed as good data), and it undercuts the proponents' claims regarding the necessity of much of the long-form census data. Policy actors have always had to "gingerly 'guess and check' over time," and unless we develop and implement mass-scale mind-reading devices and/or a Big Brotheresque state that is all-knowledgeable at a proven factual level, this will likely always be the case.

What's truly problematic is that we've been generating and using this junk data for so long. When coupled with its threats to privacy and liberty, hopefully the bad science behind much of the long-form census will help us put a stake through the heart of this government-mandated nonsense.

Sierra Rayne holds a Ph.D. in Chemistry and writes regularly on environment, energy, and national security topics. He can be found on Twitter at @rayne_sierra.