A healthy 28-year-old man named Michael Hunter went to sleep one night. One of his roommates, Joe Mannino, called 911 the next morning when he failed to awake Hunter for work. When paramedics arrived, there was nothing they could do. Hunter was dead upon their arrival, with absolutely no trauma to his body. A subsequent toxicology report revealed a mixture of two different cold medicines and a lethal dose of lidocaine. Police also found two suicide notes saved on a floppy disk among Hunter’s possessions; but, with some pieces of the puzzle not adding up, they needed a way to confirm Hunter was indeed the one who authored the notes. 

They turned to Dr. Carole Chaski, now the leading expert in the field of forensic linguistics. She examined and compared syntactic patterns, which focuses on the way an individual constructs a sentence and how words are used in relation to one another. 

Using a computer program she developed, Chaski found a distinctive use of adverbs in the suicide note. Instead of using just one, the author had a habit of combining multiple adverbs in a sentence. This pattern was not found in any of Hunter’s known writing samples, and Chaski was able to unequivocally confirm Hunter did not write the suicide notes. However, the pattern did exist—in known writing samples from Mannino, who was, unbeknownst to Chaski at the time, a medical student with access to lidocaine. 

When confronted with the results of Chaski’s forensic linguistics analysis, Mannino admitted to writing the suicide notes. Mannino claimed Hunter asked to be injected in order to ease his headache pain, but Mannino used too much lidocaine and caused an accidental overdose. Mannino was eventually convicted of involuntary manslaughter with a seven-year sentence. 

There’s no doubt Chaski’s forensic linguistics analysis set the groundwork for proving Mannino was involved in Hunter’s death—one way or another. 
This case was in 1992, and it was the first time Chaski had ever worked with police. But it certainly wouldn’t be her last.

“I have consulted [with police and attorneys] 60 to 70 times since, but only testified about six times since the suspect usually admits the crime, settles, or pleads out when presented with the evidence I find,” Chaski told Forensic Magazine. 

Linguistic v. linguistics v. stylistics

Lately, forensic linguistics has been in the news more often than is normal. The Discovery Channel is retracing the infamous Unabomber case in its new miniseries, which places James Fitzgerald, the FBI profiler who used a type of language analysis to link Ted Kaczynski to his manifesto, front and center. And as it does every April on the anniversary of his death, questions about Kurt Cobain’s suicide letter persist. Additionally, researchers from the University of Manchester (UK) are testing a new method, one which they believe has already solved a historical mystery. 

Chaski is quick to point out that the term “forensic linguistics” can be ambiguous, at best. She says the problem derives from the actual adjective “linguistic.” Linguistic can refer to anything having to do with language, and can cover all types of techniques that serve non-forensic purposes and are not grounded in science. A second definition requires expertise in linguistics, such as an education, and analytical tools and techniques that serve a forensic purpose. Like all science, the method must be repeatable. 

Fitzgerald played a pivotal role in the capture of Kaczynski. But it was Ted’s brother, Daivd Kaczynski, who told police the writing in the published manifesto sounded like his brother, providing a 23-page document for comparison. Fitzgerald then did a side-by-side sentence analysis, which became the first time language analysis was used in a criminal case in federal court to obtain a search warrant. 

That form of language analysis, sometimes called forensic stylistics, is different from what Chaski does, as she has a formal education in linguistics and uses a computational linguistics method she developed called ALIAS, or Automated Linguistics Identification and Assessment System. 
Chaski actually used the system (described later in this article), to determine if the letter left by Kurt Cobain’s side upon his death was actually a suicide note. With an 81 to 86 percent accuracy, the program confirmed it was a “real” suicide note. Chaski said “the bottom half is a stereotypical suicide note, [while] the top half is more metaphorical and poetic.”

Researchers from the University of Manchester are tackling an Abraham Lincoln mystery in true forensic linguistics style, but still different than the methods Chaski employs, as reported by Forensic Magazine last week

In November 1864, a woman named Lydia Bixby received a letter supposedly written by President Abraham Lincoln informing her that her sons had been killed fighting in the Civil War. However, the letter, which subsequently became famous as one of the best letters written in the history of the English language, became contentious, as historians have wondered if it was actually authored by John Hay, Lincoln’s secretary. 

Dr. Andrea Nini, a lecturer in linguistics and English language at the University of Manchester, and his team designed a new method, based off a common approach, to analyze the Bixby letter. N-gram tracing is a sequence of elements, including but not limited to words, characters, grammatical structures, etc. Because of the short length of the Bixby letter, Nini decided to look at the presence or absence of n-gram sequences, rather than frequency. 

“We looked at all of the n-grams size two, so all of the two-word combinations in the letter,” Nini explained to Forensic Magazine. “We checked whether these n-grams were present or absent in one author or the other. This method is new, it’s something we tried and saw it worked very well for solving this particular problem.”

This led the team to conclude Hay was actually the author of the Bixby letter, not then-president Lincoln. Nini acknowledges this was a simple case since there were only two possibilities—Lincoln or Hay. Thus, he and his team are working right now  to establish the upper limit of the method. They are taking small chunks, about 200 words, from famous novels to see if the method can correctly attribute the passages to one of up to 20 candidates.

“We want to see whether this new method works,” Nini said, “when it works, when it doesn’t work, when the old method breaks down. We aim to get this done ASAP and publish the results to disseminate to the field. Worst case, the method turns out to be another tool in the toolkit of a forensic linguist to apply in cases like the Bixby letter. Best case, it becomes a new way of doing analysis. We’ll know after we run enough tests.”

Even if the tests pass with flying colors, the method would still need to meet the Daubert standard to be admissible in a court of law.

Computational linguistics method

Obviously, Daubert and Frye play a substantial role in today’s legal system, but they’re not alone when it comes to forensic linguistics as evidence.

The Van Wyk standard must be applied in cases where language analysis will be entered into court proceedings. In the 2000 case of U.S. v. Roy Van Wyk, a district judge in New Jersey excluded part of FBI Agent James Fitzgerald’s testimony, limiting it to the comparison of characteristics or markers between the defendant’s known and unknown documents. Fitzgerald was barred from giving a conclusion about the authorship of any questioned writings. 

“The reliability of text analysis, much like handwriting analysis, is questionable because there is no known rate of error, no recognized standard, no meaningful peer review, and no system of accrediting an individual as an expert in the field,” reads the court document. 

Chaski and her ALIAS program meet all of those requirements; thus, the forensic linguistics expert can testify—stating a conclusion—in a court of law.

To operate, ALIAS SynAID needs 2,000 words or 100 sentences from a known document of each of the suspect writers, as well as the questioned document or documents. The software splits the documents into sentences and runs a part of speech tagger to see what the syntactic role of each word is in each sentence. Chaski also has trained linguists check to make sure the tag is correct since it’s easy for software to make mistakes, especially with the unedited natural language used in text messaging, emails or social media posts.

The markedness tagging is where the system applies proprietary algorithms to sort the syntactic patterns into whether they are simple or complicated. Each sentence has a count of its own, and each document is the sum of all of those sentences. The counts are then normalized to the size of the document so all documents can be compared to one another. 

“That’s important because everything gets treated the same way,” Chaski said. “The ALIAS SynAID method is completely objective. Every document gets measured exactly the same way.”

ALIAS SynAID uses a statistical classifier, called discriminat functional analysis, to build a statistical model that can tell suspect author 1 from suspect author 2. 

“Normally, we get over 95 percent accuracy telling the suspect authors apart,” Chaski explained. “Then, we use that model and apply it to the questioned document to figure out, ‘given the differences between these two suspect writers, who is most likely the writer of the questioned one?’”

ALIAS SynAID can even reach that level of accuracy with text messages, despite their typical short length. For example, in a recent analysis, Chaski took 3,000 text messages from two authors, including one-word and one-letter messages like “K” and “Yes.” She then merged the text messages into bundles of 100 time-adjacent messages and ran a discriminat functional analysis. ALIAS generated a 96 percent accuracy between the two different people sending texts. 

“As long as we have enough to build the statistical model—the 2,000 words or 100 sentences—as long as we’ve got that, then it has been successful on even short documents,” Chaski said. 

ALIAS SynAID has various modules depending on the type of questioned document, but Chaski points to SNARE, the suicide note classification module, as especially important.

“I wanted to make tools for law enforcement and security investigators,” Chaski said. “I had such a great relationship with the detectives in [the Michael Hunter case]. That really propelled me into thinking, ‘you need tools.’ We expect law enforcement officers and crime scene analysts to do things that are extremely hard to do, even by experts.”

For instance, psychiatrists are between 50 to 70 percent accurate in their identification of a suicide note. However, when police officers come upon a body, they are expected to know instantly if the note by the body is real. 

SNARE is 86 percent accurate for notes under 45 words and 81 percent accurate for notes over 45 words. It has a 14 to 20 percent error rate. 

“It’s better than the psychiatrists, and it’s an objective tool police officers can use,” Chaski said.