The Future of Forensics Has Arrived: The Application of Next Generation Sequencing
Over the last two decades, numerous innovative technological advances have fueled the expansion of the biological and chemical sciences. Following refinement and rigorous testing, many have been adopted by forensic DNA testing laboratories, including polymerase chain reaction, capillary electrophoretic instrumentation, automated liquid handling, and expert software systems. Robust validation processes often slow the introduction of new tools available to test evidentiary samples. However, this methodical diligence has protected the credibility of DNA and resulted in DNA becoming the gold standard for forensics and the most effective crime fighting tool available to law enforcement.1
The progression from Sanger-type sequencing (STS) to next generation sequencing (NGS) will further advance the regenerative sciences, personalized medicine, and forensics. Today, mitochondrial sequencing (mtDNA) is already a validated, widely accepted and utilized tool for helping to resolve complex cases. Advances in this method, through the introduction of next generation systems, are only one way in which these new discoveries will provide additive information to forensic cases.
Laboratories are expected to migrate from today’s current methods including mitochondrial sequencing, to more efficient technology platforms. In addition, forensic facilities that offer specialized testing for missing persons and mass disaster victim identification will gain great capabilities leveraging these next generation testing systems.
What is it?
The emergence of NGS technologies has yielded an unprecedented wealth of data, enabling advancement of genomic practices including whole genome sequencing, RNA-seq, and massively parallel single nucleotide polymorphism (SNP) and amplicon resequencing. A number of smaller-scale bench top NGS platforms have surfaced to accommodate the needs for an affordable, fast, easy-to-use instrument.2 These platforms deliver from 10 mega-bases to several giga-bases of sequence in a matter of hours to days and allow for multiplexing across both sample and loci.3 With this level of throughput, forensic applications can now be approached with NGS.
How can it be used?
NGS Applications in Human Identification
Among the breadth of utilities, the most common ones utilized for forensic testing include sequencing of STRs, SNPs, and mtDNA. First, NGS technology enables large multiplexes with comparable or greater than existing power of identities of commercial kits to be sequenced. In addition to providing discriminatory power based on length of allele, NGS yields sequence content, which can then be used to further dissect mixtures by base composition in the highly polymorphic repeat structure. Second, in the presence of scant or degraded DNA, identity or kinship analysis can be conducted with fragments as short as 60–70 nucleotides. Other SNP classes, including ancestry-informative SNPs, could be employed in the future to assist in routine forensic analysis and provide investigative leads and subject exclusion.4 Third, haplotyping for kinship, phylogenetics, or analysis of degraded DNA can be carried out by sequencing Y-SNPs or mtDNA. With the latter, either the mitochondrial genome can be sequenced in its entirety or only a portion such as the highly polymorphic sites and regions including the D-loop. Finally, it is possible that all of these applications can be completed in parallel because multiplexes can be designed in any combination preserving sample while speeding up the availability of probative information available to law enforcement for use in active cases.
How does it work?
Workflow from Sample to Data
In general, NGS involves three fundamental parts—library prep, sequencing, and data analysis (Figure 1). Whether working from genomic DNA or total nucleic acid of a metagenomic sample, library preparation involves obtaining fragments or amplicons of DNA of appropriate sequencing length (200bp to 400bp) and flanking them with adaptors to be used in template preparation and sequencing (Figure 1b). Prior to sequencing, most platforms require a step to clonally amplify templates with the correct adaptor topology—bead-based emulsion PCR (Figure 1c). This product serves as a template for each sequencing reaction. In semiconductor-based sequencing, incorporations of natural nucleotides are interpreted by transistors sensitive to the release of hydrogen ions during polymerization5 (Figure 1d). These signals are then translated into sequence, and the resulting data files are used as the input to bioinformatics pipelines that deliver a variety of results, from genotype to a consensus of a genome.
Figure 1: Semiconductor based next generation sequencing. Next generation sequencing can be divided into three main constituent—library preparation, sequencing, and data analysis. Library preparation (a) begins from genomic DNA, cDNA amplicons, or a metagenomic sample. Fragments of appropriate size (usually between 200bp and 400bp) are flanked by platform specific adaptor sequences (b), which is then clonally amplified on a bead before sequencing (c). Sequencing occurs as the on-bead templates are exposed to flows of nucleotides, and incorporation of bases are detected by the release of hydrogen ions using pH sensitive layer coupled to a transitor (d). Finally, changes in voltage are interpreted as sequence, and base calling generates a sequence file that can be interpreted by bioinformatics pipelines and software or platform specific plugins.
Does it work?
One Example: Sequencing of mtGenomes with the PGM
Highly compromised samples often cannot be successfully characterized with autosomal DNA markers, even employing next generation STR chemistry. The analysis of mtDNA has become the last resort of hope for such samples, because it is present in much higher copy numbers in the cell and therefore is more likely to provide useful results. The major limitation of its application to forensics has been the relatively low discrimination power of mtDNA as maternally related individuals share the same haplotype. Historically the so-called hypervariable segment 1 (HVS-1) of the mitochondrial control region, the only larger non-coding region in the mtGenome, was targeted, providing a random match probability (RMP) of roughly 1 in 30 individuals. With the entire mtDNA control region the RMP augmented to 1 in 120 and has provided useful evidence in many cases.6 The extension to analyze the entire mtGenome is a logical consequence and desirable goal to maximize the information content of mtDNA analyses.7 Conventional Sanger-type sequencing may however not be the suitable method as it is laborious and prone to error, as previous studies have shown.8 In the early years of mtDNA typing antecedent versions of the sequencing polymerase resulted in numerous phantom mutations9 but more frequently were transcription error10 and artificial recombination11 responsible for flawed data. In an effort to increase the quality of forensic mtDNA databasing the EMPOP project (www.empop.org),12 offered new protocols for mtDNA typing, revised guidelines for reporting mtDNA evidence6 and introduced a novel mtDNA database query engine for the retrieval of reliable frequency estimate.13 Meanwhile, the two leading forensic genetic journals Forensic Science International Genetics and International Journal of Legal Medicine require authors to have mtDNA population studies quality controlled by EMPOP prior to submission for review, which significantly improved the overall quality of published mtDNA population data in forensic genetics since 2010.
Early problematic NGS data on mtGenomes show that a posteriori quality control is becoming increasingly necessary with the move to parallel sequencing technologies.14 In order to evaluate the performance and reliability of current NGS, a comparative study on 64 mtGenomes from diverse phylogenetic backgrounds (sub Saharan African, Southeast Asian, and West Eurasian) and different biological sources (blood, buccal swabs, and paraffin-embedded tissue) was performed using the PGM and STS as reference method.15 Twenty-two of those mtGenomes were sequenced redundantly using two different sequencing chemistry versions (100 bp and 200 bp chemistries) on the PGM to investigate the effect of different reading lengths on the sequencing precision (Figure 2).
The DNA library was set up by long range PCR amplification of two 8.5 kbp fragments and enzymatically sheared to yield the required fragment sizes of 130 and 260 bp. Adaptors were ligated and amplicons enriched using the Ion Torrent chemistries and instrumentation (Ion One Touch) and the targets subsequently sequenced on a 316 chip. The 64 mtGenomes encompassed of a total of 1,030,437 bp (Figure 2) that were directly compared between the two technologies.
In total, our laboratory identified 176 differences between outputs from STS (consensus sequences from redundant reactions) and the Ion Torrent mito variant caller (applying a threshold of 20% variant call frequency of total coverage). This equals a difference rate of 0.017%. The majority of differences (n=108, 61.4%) were found to lie within or in the vicinity of homopolymeric sequence stretches and the application of different alignment algorithms (paired read) provided evidence that those differences did not concern physical PGM generated data but largely depended on the applied alignment method. These results are extremely promising and are expected to significantly increase the RMP for mtDNA analysis when evaluating highly compromised samples.
NGS is one of the most relevant advances that continue to revolutionize the sciences and fuel discovery. Forensic DNA testing is no exception. By applying the most sophisticated technology and through relentless validation and testing of the latest technologies, forensic analysts have access to tools that provide the most informative results. And, obtaining the best and most probative results is important when evaluating criminal cases. While the application of NGS to mtDNA sequencing is occurring today, it is anticipated that new panels will be developed and utilized to produce results in highly complex cases, including missing persons and mass disaster victim identification in the very near future. And as specialty centers begin implementing these for use in forensic investigations, they will also pave the way for analysts to further leverage the power of next generation sequencing for routine forensic investigative purposes.
- Rothberg et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature (2011) vol. 475 (7356) pp. 348-52
- Loman et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol (2012) vol. 30 (5) pp. 434-9
- Budowle and van Daal. Forensically relevant SNP classes. BioTechniques (2008) vol. 44 (5) pp. 603-8, 610
- Merriman et al. Progress in ion torrent semiconductor chip based sequencing. Electrophoresis (2012) vol. 33 (23) pp. 3397-417
- Parson and Bandelt. Extended guidelines for mtDNA typing of population data in forensic science. Forensic Sci Int Genet (2007) vol. 1 (1) pp. 13–9
- Irwin et al. mtGenome reference population databases and the future of forensic mtDNA analysis. Forensic Sci Int Genet (2011) vol. 5 (3) pp. 222–5
- Forster. To err is human. Ann Hum Gen (2003) vol. 67 (1) pp. 2-4
- Parson. The art of reading sequence electropherograms. Ann Hum Gen (2007) 71 (2) pp. 276-8
- Parson et al. The EDNAP mitochondrial DNA population database (EMPOP) collaborative exercises: organisation, results and perspectives. Forensic Sci Int (2004) vol. 139 (2-3) pp. 215–26
- Bandelt et al. Artificial recombination in forensic mtDNA population databases. Int J Legal Med (2004) vol. 118 (5) pp. 267-73
- Parson and Dür. EMPOP-a forensic mtDNA database. Forensic Sci Int Genet (2007) vol.1 pp. 88-92
- Röck et al. SAM: String-based sequence search algorithm for mitochondrial DNA database queries. Forensic Sci Int Genet (2011) vol. 5 (2) 126-32
- Bandelt and Salas. Current next generation sequencing technology may not meet forensic standards. Forensic Sci. Int. Genet. (2011) 6 (1) pp. 143-5.
- Parson et al. Evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM). in prep..
Lisa Lane Schade is the Director of Global Market Development for the Human Identification Business at Life Technologies. For over a decade, she’s worked with forensic scientists and thought leaders throughout the world to develop programs and products targeted at solving complex forensic challenges.
Sharon Wootton received her Ph.D. in Bioengineering from UC Berkeley. She is a senior research scientist working on next generation forensic technologies at Life Technologies.
Walther Parson leads an active group of scientists that published more than 200 peer-reviewed articles in the forensic and medical genetic field within the past 10 years at the Institute of Legal Medicine in Innsbruck Medical University. His research helped to establish the National DNA Database Laboratory of the Austrian Federal Ministry of the Interior, and he is an expert in using advanced DNA techniques to resolve highly complex identification cases.