A new study that relies on machine learning methods found that 19 percent more COVID-19 deaths occurred than official records indicate. The authors say their study has revealed critical gaps in the U.S. death investigation system, and is leading to renewed calls for reform.
“Without an accurate count of who is dying and where, public health resources can’t reach the communities that need them most,” said study senior and corresponding author Andrew Stokes, associate professor of global health at Boston University School of Public Health. “The COVID-19 pandemic exposed the long-standing gaps within the American death investigation system, and the changes that are urgently needed to improve the quality of cause-of-death data—for all deaths, not just those caused by COVID-19.”
The study did not directly examine the specific reasons why many death certifications list inaccurate causes of death, but other sources point to COVID-era challenges such as inadequate staffing to conduct postmortem COVID-19 testing, a lack of standardized training and protocol for death investigators, and partisan beliefs that may cloud investigators’ judgment—particularly for county coroners, who are politically reported and are not required to have medical backgrounds.
Overall, the researchers recommend increased funding and training for death investigators, as well as increased hiring of medical examiners.
Machine learning methods
Since the early days of the pandemic, Stokes has led a team of researchers to develop a body of research on excess mortality, aiming to capture the pandemic’s hidden death toll. What makes this most recent study stand out is the machine learning algorithms behind the data.
The new algorithms are more reliable than the excess mortality models in the team’s prior work because the algorithms were informed by federal health datasets of hospital-verified, inpatient COVID-19 death information, as opposed to estimates derived from broader all-cause mortality data. Previously, few studies had applied this type of machine learning modeling to track mortality trends.
“Because COVID-19 testing was near-universal in hospitals, deaths in those settings were more likely to be accurately classified,” said Stokes. “That gave us a strong foundation to train our model, which we then applied to out-of-hospital settings, where testing was far less consistent and COVID-19 deaths were more likely to go unrecognized.”
According to the study results, published in Science Advances, more than 155,000 U.S. deaths between March 2020 and December 2021 were not officially recorded as COVID-19 deaths. In fact, the number of COVID-19 deaths that occurred at home was 160 percent higher than official records reflected, suggesting there could have been more than 111,000 uncounted COVID-19 deaths that occurred in homes.
The study results also show that deaths from COVID-19 were more likely to be overlooked among people in the South. Alabama, for example, had 67 percent more total predicted COVID-19 deaths than officially reported.
COVID-19 deaths were also more likely to be uncounted for people between the ages of 65 to 84, males, those who did not have a high school education; people who identified as Hispanic, American Indian and Alaska Native, Asian, and Black, and those who lived in counties with lower socioeconomic status and more pre-pandemic health issues. Many of the predicted uncounted COVID-19 deaths were attributed to underlying causes, such as Alzheimer’s disease and related dementia, cardiovascular disease and diabetes.
“Under-counting inequities in COVID-19 deaths can be viewed as both a manifestation of structural racism, ableism, and classism and as a mechanism preventing responsive policy action,” said study co-author Dielle Lundberg, research fellow at Boston University School of Public Health. “Undercounts functioned to absolve health policymakers of their failures to enact pragmatic health policies during the pandemic.”
While machine learning algorithms cannot replace broader systemic reform, the study authors do note that the methods could potentially be adapted to other settings where cause-of-death data may be incomplete, delayed, or suspected to be biased, including drug overdoses and deaths in police custody.