Can human judgment and decision making, adjusting constantly to a changing set of circumstances, be quantified for juries and judges?

Likelihood ratios, a complex statistical model, attempts to do just that in forensic science, by putting numbers to the complexities of crime solving and detection. The LR concept is increasingly being used in European courts and elsewhere.

But they are an intensely-debated phenomenon in much of the world—especially in the U.S., amid the growing pressures to overhaul forensic science in favor of reproducible, quantifiable evidence.

A new study by two statisticians at the National Institute for Standards and Technology (NIST) contends LRs should be used sparingly in American courtrooms—since the attempt at objectivity is actually only a mathematical model of an expert’s subjective reasoning process.

“We are not suggesting that LR should never be used in court, but its envisioned role as the default or exclusive way to transfer information is unjustified,” said Steve Lund, one of the statisticians, in a NIST statement on the work. “Bayesian theory does not support using an expert’s opinion, even when expressed numerically, as a universal weight of evidence. Among different ways of presenting information, it has not been shown that LR is most appropriate.”

The LR concept was proposed by Reverend Thomas Bayes in England in the 18th century, a way to marry probabilities to observations and decisions. The modern breakthrough in its usage was by Alan Turing, who used LRs to crack the Germans’ Enigma Code during World War II, providing a key advantage to the eventually-victorious Allies.

A simplified example of the use of LR was presented in the new NIST paper. Two urns have 100 balls inside. Urn 1 has 99 red balls and one green ball; Urn 2 has 99 green balls and one red ball. One of the urns is picked, and the balls are thoroughly mixed. The observer, if guessing which urn was picked at this point, would have a 50-50 chance of getting it right. But then a ball is selected, and it is red. The single ball selection means it is very likely that the randomly-chosen urn is Urn 1, where 99 percent of the balls are red (as opposed to Urn 2, where selecting the single red ball was very unlikely on the first pull). The observer would thus update their belief of which urn had been picked, based on the new information. This is a kind of Bayesian decision making, accounting for an altering flow of information.

(But the use of LR often involves a much more complicated set of circumstances—for instance, where each urn had an equal 50 red balls and 50 green balls, and three balls were selected).

Amid increasingly complex situations, an expert presenting such decision-making theories in a courtroom would only be presenting their own solitary logic process—and not a universal language that the jury could justly interpret, the NIST numbers experts contend.

In the wider vantage point of a criminal trial, involving other experts and detectives and numerous other players and factors, putting a number to a sole brain’s workings does not warrant a quantification, the pair argue.

“Thus, while decision theory may have a normative role in how a 'decision maker' processes information presented during a case of trial in accordance with his or her own personal beliefs and preferences, it does not dictate that a forensic expert should communicate information to be considered in the form of an LR,” they conclude.

The LR debate has potential ramifications in the increasingly complex world of DNA mixture interpretation. (NIST last week announced it would be conducting tests and analyses of the various mixture methods currently available to DNA forensic sciences, including probabilistic genotyping software programs like STRmix and TrueAllele.)

Some argue that it is simply the translation and communication aspects of LR that need to be refined for understanding by non-mathematicians. Some experts have suggested ways to express an LR “in plain English” to laymen, using the strength of the match between the evidence and a suspect as a numerator, and the possibility for a coincidental match as the denominator.

The two NIST statisticians indicate that DNA mixtures could indeed be an exception where LR could help parse out incredibly complex evidentiary situations.

“Forming a lattice of assumptions and uncertainty pyramid, including explicitly identifying what data will be considered, for applications in the field of high-template, low-contributor DNA evaluations could help to provide clarity to other forensic disciplines seeking to demonstrate or develop a basis for using a similar LR framework,” they write.

The debate over LR has been years in the making, and has only accelerated recently. A paper in the journal Forensic Science International in 2014 found the mere mention of numbers strengthened the persuasive power of experts. An entire issue of the journal Science and Justice was even devoted to the debate last year.