EXCLUSIVE: Brand New Deterministic Software Can Deconvolute a DNA Mixture in Seconds

  • <<
  • >>

584648.jpg

Mixolydian's software solution has the potential to completely upend forensic DNA analysis. Credit: Mixolydian

In groundbreaking work that has the potential to completely upend forensic DNA analysis, a particle physicist has developed technology that can deconvolute a DNA mixture of up to 8 people in less than one minute—in a deterministic manner.

That part is critical as the only currently available deconvolution solutions are probabilistic, not deterministic. Probabilistic genotyping software outputs a likelihood ratio (LR) to express the weight of evidence. It evaluates the evidence relative to alternative pairs of hypotheses, for example, the probability that the DNA is from the suspect versus the probability that the DNA is from an unknown, unrelated individual. Probabilistic genotyping software also requires that a suspect's DNA be available to compare the result.

All of that is irrelevant with Jake Wortman’s technology from his brand-new company Mixolydian, which officially put its name out in the world Monday night, although Wortman has been perfecting the solution since 2016.

“You have to be very sure when you come out with something that is this different,” Wortman told Forensic in an exclusive interview. “It’s a completely different paradigm for how to analyze mixed-contributor DNA.”

It’s also an incredibly accurate one, with 100% test accuracy for DNA mixtures containing two contributors—a condition commonly seen in rape cases.

From physics to DNA

It’s unexpected that the possible solution to the DNA mixture deconvolution problem came from the brain of not a forensic scientist, but a physicist. At the same time, it makes sense that—after decades of trying to solve this limitation—the answer came in the form of novel mathematics.

With a background in particle and detector physics, Wortman worked on muon tomography for CERN before he started a job at Johns Hopkins Applied Physics Lab (APL) in 2016 doing data analysis. There, he got involved with a group that was in the early stages of generating ideas on how to circumvent the limitations of current mixed contributor DNA analysis.

Working on a Master’s degree in biomedical systems engineering, Wortman asked the group if he could run data analysis for them in order to familiarize himself with biological data. The group readily agreed, put together simulated mixtures, ran them on Verogen’s MiSeq FGx Sequencing system, and handed the data over.

“I had a hunch I had an idea, so I took to MATLAB, coded a prototype of the idea I had in my head, tested it with the simulated data, and it worked right off the bat,” Wortman recalled. “At this point, it’s been 4 weeks since I joined the group and I show up to meeting number two and I’m like, ‘hey, I think I solved this.’ They looked it over and agreed.”

After subsequent, successful human trials, Wortman spun the company off from APL and acquired additional funds to execute larger trials. Today, as the startup Mixolydian, they have tested over 250,000 unique mixtures—all demonstrating an accuracy of >97%.

How Mixolydian is different

Probabilistic genotyping software are large Bayesian inference networks. Contrastingly, Mixolydian’s technology is a novel mathematical formula that performs a full deconvolution to output the genotypes of individual contributors. Where probabilistic genotyping software needs a suspect's DNA to compare the result, Mixolydian provides the actual genotype for immediate input into CODIS and/or other databases to search for possible hits.

The startup is about two weeks away from publishing a white paper that contains the results of over 250,000 unique DNA mixture deconvolutions that an independent lab in Houston executed. The results described in the white paper are staggering.

Mixolydian has two different ways of classifying their accuracy—two-allele accuracy and at least one allele accuracy. Two-allele accuracy is straightforward—what is the accuracy at each locus that is being tested. For two-allele deconvolution, in the 2-contributor space, Mixolydian achieves a 99% accuracy per contributor, no matter the number of samples. In a 3-person mixture, the accuracy ranges from 98 to 99%. In a 4-person contributor, accuracy is 98% with 12 or more random samples.

However, Wortman says it is in the at least one allele space where Mixolydian’s novel algorithm really shines.

“The at least one allele thing sounds less than ideal but if you run the numbers on the allele frequencies at different loci in the world population, you find that you get a 1 in greater than 10 billionth chance of a double match as long as you’re doing at least one allele for at least 23 loci,” explains Wortman. “Knowing at least one is a match in each locus is enough to say there is probably no other match on the planet. Compare that to the match probability of CODIS, which is somewhere around 1 in 10,000, and you can say that least one allele matching is entirely sufficient for justice applications.”

In at least one allele testing, Mixolydian has yet to return anything other than a 100% accurate result for a 2-person mixture. That 100% accuracy is also reflected in 3-person mixtures from 9 to 18 random samples. With 4-person mixtures, the highest is 100% while the lowest is 99.835%. The accuracy rate barely takes a hit as the numbers grow: nothing under 99% accuracy in 5-person mixtures; the high for 6-person mixtures is 99%, the low is 98.5%; 7-person mixtures fall in the 97 to 98% accuracy region; and 8-person mixtures are at 97.5%.

When Mixolydian’s solution exports a number of contributors, what the software is really saying is, “this is the maximum number of contributors I can and will deconvolve.” That is 100% intentional as Wortman says he wants his platform to “fail gracefully.”

“I don’t ever want there to be a situation where it thinks it is more confident than it is,” he said. “We’ve never once, in any of our testing, estimated a number of contributors that is greater than the truth. We’ve never once fabricated a human being in a mixture. That is very much built into the technology—that cannot happen, it will not do that.”

It’s a completely different paradigm for how to analyze mixed-contributor DNA.
Jake Wortman

If, for whatever the reason, the number of contributors is inaccurate, the number will always be less than the actual. If this does occur, the technology will assign a confidence value to the mixture. For example, in an 8-person mixture, it will say, “I can find only 4 people in this mixture, but I am confident there are more.”

Other distinguishing factors

Beyond accuracy, Mixolydian separates itself from the pack by harnessing the power of multi-sampling.

“If you give the software five samples of a mixture, then we can take in all five of those samples and produce a deconvolution for the whole pack,” said Wortman. “You don’t need everyone to be in every sample. We’ve tested that out. If people are missing from certain samples, it doesn’t really matter.”

For example, say a firearm is left at a crime scene. Traditionally, investigators swab the trigger and sequence that, then they swab the handle and sequence that, then maybe they swab the bullets and sequence that. Mixolydian, however, accepts the swab from the trigger, the swab from the handle, the swab from the bullet, and any others all at the same time. Once placed into the software, the platform then outputs how many people are contained in that entire set, as well as the full genotypes of all the contributors.

Additionally, the software outputs the data in seconds. Probabilistic genotyping software can be extremely resource-intense, even requiring a supercomputing cluster for mixtures in the 5 or more contributor range. According to Wortman, Mixolydian’s technology can run a full analysis on mixtures from 2 to 8 contributors on a single CPU in 44 seconds.

“I’m still in MATLAB world, so that will get highly optimized and drop significantly,” says the physicist.

Although probabilistic genotyping software like TrueAllele and STRmix have been affirmed in court as “generally accepted as reliable by the relevant scientific community,” the challenges and filings against this type of software keep coming. Defense attorneys have argued that they require access to the software’s source code to commence an independent review. The companies behind both software programs have allowed that to happen, although Cybergenetics has taken the second step of securing TrueAllele’s source code as a trade secret—a claim they have won in court multiple times.

Wortman says he does not plan to treat his source code as a trade secret, rather, he welcomes experts to review it, saying it is easily “black box testable.”

“Our code is very simple. It’s very concise,” said Wortman. “I’d be shocked if it took anyone more than a day to view. It’s not one of these giant Bayesian inference networks, nor is it an AI beast—it’s a mathematical formula that is novel and has lots of things attached to it that are novel, but at the end of the day it’s a new piece of math—so it’s easy to review.”

More to come

This is only the beginning for Mixolydian. They may have just launched themselves into the public eye, but Rob Kramer, the CEO, says they are already in deep discussions with multiple law enforcement entities to coordinate pilots. The startup is also focusing on a few high-profile cold cases, as well as cases that have been settled with possibly the wrong suspect behind bars.

“We are doing our due diligence right now,” said Kramer. “We want to have these discussions only where we think we can help and we’re relevant.”

Although the long-awaited white paper validating the promise of Mixolydian’s technology is just around the corner, Wortman says the startup is still embarking on a couple more trials. One is a low read test where they are testing down to 7.5 picograms of DNA to see if the platform still performs well. The other is a degraded DNA test in collaboration with a laboratory that often works with damaged evidence and “dirty DNA.”

At the moment, Kramer and Wortman see wrongfully accused, cold cases, sexual assaults and firearm analysis as the forensic areas where Mixolydian can make a difference. Additionally, as more and more labs adopt next-generation sequencing, there will most likely be an increase in DNA mixtures—leading Kramer to predict that Mixolydian will “be more and more useful to cases as time goes on.”

“You think about all the different applications this has and it becomes terrifying, almost, what has been used before, but also very exciting for things we can clear up,” said Wortman. “To be able to definitively say this person is in a mixture and process things so quickly and cleanly and so ‘explain-ably’—situations like the rape kit backlog and falsely accused people in prison. It just sort of punches you in the chest. We finally have the technology to fix a lot of societal issues and help a lot of people. When I figured it out, it was a quick, ‘Yeah, I did it!,’ and then a very sobering, ‘We have a lot of work to do.’”

 

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and products for the lab. Plus, get special offers from Forensic – all delivered right to your inbox! Sign up now!