How to see through bad coronavirus statistics

Over the past few months, we have seen many coronavirus statistics.  We have also seen many statistical errors, which have made some of us skeptical.  My complaint is that not enough of us have been skeptical.  From the point of view of someone who uses statistics as part of his work, these errors are simplistic.  Allow me to name a few of the errors and show how they can deceive people.

Choose which attribute to measure (deaths versus cases).  The COVID-19 pandemic is an experiment with seven billion guinea pigs: us.  At the beginning of the experiment, the cases and deaths were counted.  The cases were determined with a doctor carefully examining the patient and deciding if this is from COVID-19.  The deaths were determined with a doctor following the patient until the patient's unfortunate demise and again deciding if this is from COVID-19.  The latter decision is likely more accurate, so it is a more reliable statistic.  Counting the cases is still useful because it leads to a probability of death given infection.  In the first few months, this probability varied a great deal.  This means that one of these statistics is poorly measured.  The inaccurate statistic is undoubtedly the number of cases.

Measure the attribute the same way throughout the experiment (avoid changing the way "cases" is defined).  Initially, cases were defined with a doctor's examination.  Later, a test for the virus was used.  Later still, a test for the virus or antibodies was used.  Currently, the patient and anyone with whom he had recent contact are called "cases."  The CDC is responsible for much of this.  For example, the CDC has been recently asking states to change the way they define cases.  If that does not sound confusing enough, only some states are making this change, so, state by state, statistics will differ.

Look for problems with data integrity.  There have been accusations of people fudging the data.  For example, the CDC has been accused of changing old statistics.

Normalize the data (take into account different population sizes).  Often, COVID-19 statistics are displayed state by state.  Does this link say New York and California have a lot of COVID-19 deaths or just a lot of people?  It turns out that as of 17 July 2020, New York has 167 COVID-19 deaths per 100,000 people, and California has just 19

What are the consequences of these statistical flaws?  Because of the change in counting the number of cases, that number has greatly risen over the past month.  It looks so different from the COVID-19 death trend that it is obviously flawed.  From the point of view of the typical reporter, looking at cases makes better news, and it beats up on President Trump.  Now Florida mistakenly looks like a leper colony, so even though Florida has only one eighth as many deaths per 100,000 people as New York, New York governor Andrew Cuomo declared that people visiting Florida and returning to New York must quarantine themselves for 14 days.  Cuomo, whose state is tied with New Jersey as the worst state in the union, as determined by counting deaths, is jealously blaming a better run state for supposedly being worse than New York.  Cuomo should put his emotions aside and try to improve his job performance.

The conclusion we should make is that we should look at the normalized deaths and ensure that they are just from COVID-19.

Readers of American Thinker are not dummies.  You read these articles because you want to understand the world better.  Hopefully, you now have a better idea of how statistics can be mismanaged to lead to incorrect conclusions and conclusions that the wrong people want.  You can now tell your liberal friends to avoid these mistakes.  Unfortunately, this requires that they perform some arithmetic, and most people prefer that those with journalism degrees do their math for them.

Over the past few months, we have seen many coronavirus statistics.  We have also seen many statistical errors, which have made some of us skeptical.  My complaint is that not enough of us have been skeptical.  From the point of view of someone who uses statistics as part of his work, these errors are simplistic.  Allow me to name a few of the errors and show how they can deceive people.

Choose which attribute to measure (deaths versus cases).  The COVID-19 pandemic is an experiment with seven billion guinea pigs: us.  At the beginning of the experiment, the cases and deaths were counted.  The cases were determined with a doctor carefully examining the patient and deciding if this is from COVID-19.  The deaths were determined with a doctor following the patient until the patient's unfortunate demise and again deciding if this is from COVID-19.  The latter decision is likely more accurate, so it is a more reliable statistic.  Counting the cases is still useful because it leads to a probability of death given infection.  In the first few months, this probability varied a great deal.  This means that one of these statistics is poorly measured.  The inaccurate statistic is undoubtedly the number of cases.

Measure the attribute the same way throughout the experiment (avoid changing the way "cases" is defined).  Initially, cases were defined with a doctor's examination.  Later, a test for the virus was used.  Later still, a test for the virus or antibodies was used.  Currently, the patient and anyone with whom he had recent contact are called "cases."  The CDC is responsible for much of this.  For example, the CDC has been recently asking states to change the way they define cases.  If that does not sound confusing enough, only some states are making this change, so, state by state, statistics will differ.

Look for problems with data integrity.  There have been accusations of people fudging the data.  For example, the CDC has been accused of changing old statistics.

Normalize the data (take into account different population sizes).  Often, COVID-19 statistics are displayed state by state.  Does this link say New York and California have a lot of COVID-19 deaths or just a lot of people?  It turns out that as of 17 July 2020, New York has 167 COVID-19 deaths per 100,000 people, and California has just 19

What are the consequences of these statistical flaws?  Because of the change in counting the number of cases, that number has greatly risen over the past month.  It looks so different from the COVID-19 death trend that it is obviously flawed.  From the point of view of the typical reporter, looking at cases makes better news, and it beats up on President Trump.  Now Florida mistakenly looks like a leper colony, so even though Florida has only one eighth as many deaths per 100,000 people as New York, New York governor Andrew Cuomo declared that people visiting Florida and returning to New York must quarantine themselves for 14 days.  Cuomo, whose state is tied with New Jersey as the worst state in the union, as determined by counting deaths, is jealously blaming a better run state for supposedly being worse than New York.  Cuomo should put his emotions aside and try to improve his job performance.

The conclusion we should make is that we should look at the normalized deaths and ensure that they are just from COVID-19.

Readers of American Thinker are not dummies.  You read these articles because you want to understand the world better.  Hopefully, you now have a better idea of how statistics can be mismanaged to lead to incorrect conclusions and conclusions that the wrong people want.  You can now tell your liberal friends to avoid these mistakes.  Unfortunately, this requires that they perform some arithmetic, and most people prefer that those with journalism degrees do their math for them.