The Benefits of Deconvolution

621324.jpg

by Paul Stafford Allen, Forensic Scientist, STRmix

DNA profiling has been a staple for identifying individuals involved in criminal investigations since the early 1990s. Relying on the application of fixed stochastic thresholds and other biological parameters (peak height ratios, mixture proportion ratios, stutter ratios, etc.) to manually analyze DNA samples, the heuristic methodology worked well—and was generally accepted in court proceedings—for single source and most two-person mixtures.

Unfortunately, the heuristic methodology can sometimes be painfully slow and less reliable when DNA mixtures are more complex. As a result, the majority of forensic labs have now turned to probabilistic genotyping software programs that use mathematical models to deconvolute DNA mixtures. Deconvolution enables an investigator to separate the combined DNA signals to reveal individual genetic profiles in low-level, degraded, or mixed samples from multiple contributors. With the adoption of probabilistic genotyping methods for DNA profile interpretation, labs can now interpret more and more complex mixtures. We are increasingly interested in knowing what the probative value of these complex mixtures is. For example, are the expected match statistics compelling enough that we should seek a reference sample from a person of interest?

Let’s examine a hypothetical case in which a mixed DNA profile is obtained from the innermost layer of wrappings of a package of controlled drugs located at the crime scene. An apparent three-person mixture is recovered and, when interpreted using STRmix probabilistic genotyping software, estimated mixture proportions of 59%, 38%, and 13% are obtained. The electropherogram for this mixture (Figure 1) shows the average peak heights for the two smaller components are roughly 500 rfu and 200 rfu respectively.

In this case, let’s suppose that we are interested in the probative value of the 13% minor component of the profile. We can use the investigative application “Explore Deconvolution” from within DBLR software to determine whether this component is suitable for comparison to any reference profiles (standards) submitted in this case. The Explore Deconvolution function uses the DNA profile to simulate thousands of profiles given different proposed scenarios and then assigns likelihood ratios (LRs) for each one of these simulations. In this case we can simulate profiles for individuals to address two specific hypotheses:

  1. that component 3 of the mixed DNA originated from the simulated individual and two unknown individuals; or
  2. that this DNA originated from three unknown, unrelated individuals.
IMAGE DESCRIPTION
Figure 2

Genotypes are simulated considering each hypothesis and the LRs computed. The logarithm of the LRs can then be plotted graphically (Figure 2). LRs of 0 are plotted as log10 (LR) = -40 and the black vertical line indicates a chosen LR threshold, here configured as one million (log10 (LR) = 6).

The results of the simulation testing indicate that for the 13% minor component, the probability of obtaining an LR greater than one million given a true contributor is about 77.6%. This is indicated by 77.6% of the data (blue distribution) to the right of the black vertical line in Figure 2. Although the expected log10(LR) is about 7.5 (mode of distribution), there is considerable spread in the LRs, which indicates there are many genotypes which could not be ruled out in the deconvolution.

The results for the simulated non-contributors (red distribution) indicate that the probability of an LR greater than one million given a non-contributor is 0. By default, we simulate 100,000 profiles under each hypothesis. This is user configurable but given the power of modern STR DNA analysis and assuming a reasonable quality profile, we often expect the vast majority of non-contributor profiles to be excluded, with only a small proportion yielding inclusive LRs.

To fully explore the chance of an adventitious match using uninformed (random) sampling, we would need to sample millions or potentially billions of non-contributors. Instead, we can use importance sampling within DBLR to bias our estimate, ensuring that we include the genotypes most likely to result in an adventitious match and then account for that bias when summarizing. Using importance sampling, the probability for a non-donor to exceed the threshold is obtained as 6.13×10-8 (approximately one in 16 million). These results—a high probability of detecting a true contributor and a very low probability of false inclusion of a non-contributor—combine to suggest that this portion of the result is suitable for meaningful comparison with a reference profile database.

What further investigation can be done?

At the early stages of the investigation, there are no persons-of-interest but let’s assume that there is a database containing one million individuals to which we could compare the deconvoluted profile. The third component (the 13% minor), however, does not appear suitable for uploading to CODIS, the software system developed by the FBI to facilitate storage and comparison of DNA profiles within a multi-tiered system of databases.

Based on the analyst’s experience, a single, searchable genotype cannot be confidently attributed to this contributor at any locus. We can reinforce this assessment using the Explore Deconvolution module to list the most likely genotypes or most likely alleles per-contributor and per-locus, shown in Figure 3. In this instance, a variety of plausible genotypes are listed, and none received sufficient weighting to allow confident searching, agreeing with the analyst’s interpretation. Given that, how can the third component mixture be compared to a database containing a million profiles?

IMAGE DESCRIPTION
Figure 3

By using the Database Search function in DBLR, a complete profile deconvolution or a specified component of a mixture can be directly compared to a database, including those containing millions of individuals. Because the database is not static, additions can be made as needed. Profiles can also be deleted if required. In addition, profiles deconvoluted in STRmix can be saved to a separate database and automated comparisons between the two can be configured.

In the present example, a search of our database of one million individuals can be undertaken and the results can be ranked using LRs. Component three of the deconvoluted mixture can be compared to the database of one million individuals in a matter of seconds and an LR can be assigned for each database contributor based on the following propositions:

  1. component 3 of the mixed DNA originated from a database individual and two unknown individuals; or
  2. this DNA originated from three unknown, unrelated individuals.

The LR assigned for one of the individuals in the database is approximately 2×107. None of the other individuals within the database yields an LR above the chosen threshold of one million. This match provides very strong intelligence for the investigating agency to use in this case.

Finally, let’s assume that a reference sample (standard) from the person-of-interest is submitted as part of the investigation. We can now assign an LR using STRmix to determine the strength of the association. This output can then be used to present evidence in court proceedings.

Moving beyond simple identification of contributors, if instead we assume that no matching individual was identified on the database, the same principles of Explore Deconvolution and Database Searching can be applied to a familial search—that is to say, searching with hypotheses such as:

  1. component 3 of the mixed DNA originated from a [parent/child or sibling] of a database individual and two unknown individuals; or
  2. this DNA originated from three unknown, unrelated individuals.

This approach can be undertaken using DBLR without the need for additional deconvolution and using a custom LR threshold depending on the type of search, size of database, and individual deconvolution features. This opens up entirely new avenues of intelligence for an agency for the most serious and challenging cases.

Final thought

Our hypothetical case illustrates that deconvolution does much more than separate combined DNA signals for evidential comparisons. Using the Explore Deconvolution function, investigators can analyze unresolved DNA samples in forensic investigations and bridge mixed DNA samples with genealogical database searches. This enables investigations to maintain a high level of reliability with optimized probability thresholds and low-confidence filtering, ultimately helping in the resolution of criminal cases.

About the author: Paul Stafford Allen has been a forensic scientist for more than 20 years, with both laboratory-based and crime scene expertise in body fluid identification, blood pattern analysis, DNA analysis, and court reporting. His specialization is in mixed DNA sample evaluation and probabilistic genotyping. For more information, visit http://www.strmix.com.



Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and products for the lab. Plus, get special offers from Forensic – all delivered right to your inbox! Sign up now!

More News