The Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS, program is a complex piece of software that is intended to predict which people in the criminal justice system are likely to be repeat offenders. The algorithm it uses to generate mathematical predictions are based on a set of 137 aspects of the person’s personal history, their record, and their likely conditions in the future.

But the sophisticated computer prognostication system is no more accurate than crowdsourcing non-experts based on just two factors, claims a new study by Dartmouth University computer scientists.

“People from a popular online crowdsourcing marketplace—who, it can be reasonably be assumed, have little to no expertise in criminal justice—are as accurate and fair as COMPAS at predicting recidivism,” they write this week in the journal Science Advances. “Although COMPAS uses 137 features to make a prediction, the same predictive accuracy can be achieved with only two features.”

The company that makes the software, Northpointe, took issue with several key aspects of the paper, they said in a statement to Forensic Magazine.

The tests were run on 1,000 pretrial defendants from Broward County, Florida, who were appearing in court in the years 2013 and 2014. The COMPAS software was used to predict whether they would again be arrested for a misdemeanor or felony in the next two years, they write.

At the same time, the subjective test from a group of people was compared against the accuracy of the advanced predictive software program. The participants were recruited through Amazon’s Mechanical Turk, a crowdsourced on-demand work marketplace where people do tasks for money. The participants were recruited through a “predicting crime” task where they were told they would be paid $1 for completing the entire task—and would be given $5 for getting more than 65 percent of the predictions right in singling out who would commit crimes in the future, based on a few sentences’ description. (Three questions among 50, which involved a bit of reading comprehension, were thrown in the mix randomly to ensure participants were not just clicking random answers.)

The final results: the COMPAS software was 65.2 percent accurate, and the crowdsourced humans were 67 percent accurate.

“Although Northpointe does not reveal the details of their COMPAS software, we have shown that their prediction algorithm is equivalent to a simple linear classifier,” they write. “In addition, despite the impressive sound use of 137 features, it would appear that a linear classifier base on only two features—age and total number of previous convictions—is all that is required to yield the same prediction accuracy as COMPAS.”

A spokesperson for Northpoint, which now does business as Equivant, said the methodology contains “serious errors.” They intend to request the data and peer review results, they explained.

Among the company’s complaints: the 137 inputs are not used in the risk assessment—instead those 137 are needs factors. The risk factors number only six, they said. Another complaint was “limited internal validation"—since the two-question risk scale could have led to “over-fitting” the data to heighten apparent accuracy. Most interestingly, they find the roughly equivalency between the computer and the crowdsourcing accuracy as proof of how the system works, they said.

“The findings of ‘virtually equal predictive accuracy’ in this study, instead of being a criticism of the COMPAS assessment, actually adds to the growing number of independent studies that have confirmed that COMPAS achieves good predictability and matches the increasingly accepted AUC standard of 0.70 for well-designed risk assessment tools used in criminal justice.”

Julia Dressel, one of the authors of the study, which was part of her undergraduate thesis in computer science, said their work did not justify using the software to make life-altering decisions on those accused of crimes.

“Claims that secretive and seemingly sophisticated data tools are more accurate and fair than humans are simply not supported by our research findings,” said Dressel, in a school statement.

COMPAS has been used to assess more than 1 million offenders in the United States since the recidivism prediction component debuted in 2000, according to the researchers.

Use of the software drew some scrutiny with a 2016 investigation by ProPublica entitled “Machine Bias,” which alleged that racial disparity in prediction showed that black arrestees were over-predicted to re-offend, while white arrestees were under-predicted to re-offend.