Advertisement

CODIS was set up nearly 30 years ago with an eye toward making it an exacting identification system based on 13 markers. Those 13 locations, or loci, on the genome were selected to only identify the individual—without including other data, like physical characteristics or ancestry.

But a team of scientists continue to discover that more is there in CODIS than ever intended. Their latest paper demonstrates that the 13 loci have information encoded in their markers that could indicate much of how the rest of the full DNA profile looks, they report in the Proceedings of the National Academy of Sciences.

“What this paper raises is a new way of thinking about the information encoded in the CODIS markers,” said Noah Rosenberg, the senior author, from Stanford University. “Much of the privacy discussion around the CODIS markers has been about whether the CODIS markers themselves encode or predict phenotypes. What we’re saying is that the CODIS markers, along with databases of other markers … could lead to prediction of phenotypes.”

By comparing the two sets of data—CODIS and full profiles—from the same set of 872 people, they were able to match about 90 percent of the records.

“The potential for record matching of SNP and CODIS STR profiles, especially with augmented CODIS panels, uncovers new risks to privacy,” the team writes. “Thus, authorized or unauthorized analysts equipped with two datasets, one with SNP genotypes and another CODIS genotypes, could possibly identify some pairs of records that are likely to represent the same person."

What could come with these matches—and more detailed knowledge of the tapestry of complex genomic information—could be the physical description and ancestry information that was never intended to be part of the CODIS loci, whether there were 13 or 20 markers.

“For people with records linked in this way, CODIS genotypes would revealed genomic SNP genotypes that could, in turn, reveal much more information than the CODIS genotypes themselves—such as precise ancestry estimates, health and identification information that accompanies SNP records, and predictions for genetically influenced phenotypes,” they add.  

Rosenberg told Forensic Magazine if the full DNA genome is considered as a tapestry, then CODIS or other specific markers in other databases could identify which particular tapestry—i.e., individual—one is looking at. 

“If you have some of the threads in the tapestry, then that might tell you what tapestry you’re looking at,” he said.

The same team, from Stanford University, the University of Michigan and the University of Manitoba, demonstrated that the ancestry information had tagged along on the 13 CODIS loci, in a preliminary study in the journal Current Biology last year. The same authors—Rosenberg, Bridget Algee-Hewitt and Michael Edge of Stanford, Jun Li from Michigan and Trevor Pemberton from Manitoba—worked on the study.

The next step of the work focuses on “relatedness”—and whether CODIS and medical databases could be linked, to find people related to one another, Rosenberg said.

CODIS was originally started as a pilot software project in 1990. But what started with 14 state and local laboratories ballooned with the DNA Identification Act of 1994. Currently, more than 190 public law enforcement agencies take part in the database within the U.S., and 90 labs in 50 countries abroad also use and contribute to it.

The amount of data on file has skyrocketed since the turn of the century. What was once under a half million profiles now includes nearly 12 million offenders, 2 million arrestees, and other profiles—and it grows every day.

The FBI increased the number of loci from 13 to 20 at the beginning of this year. Increasing the number of loci can also increase the amount of information—ancestry and others—that is wrapped up in those markers, the researchers said.

Li, the author from University of Michigan, told Forensic Magazine that the increase to 20 loci was only going to increase the information load that comes from CODIS.

"It's going to be a more powerful dataset," Li said.

Advertisement
Advertisement