A copy of the Bixby letter. Photo: University of Manchester

A team of forensic linguistics experts lead by the University of Manchester’s Dr. Andrea Nini believe they have solved a longstanding mystery about a letter Abraham Lincoln may have authored.

In November 1864, a woman named Lydia Bixby supposedly received a letter from then-president Lincoln with his condolences after she lost more than one son to the Civil War. (According to the story, the woman was actually a Confederate sympathizer, and immediately burned the letter after reading it.)

The letter, however, became famous as one of the best written letters in the history of the English language. But there have always been questions surrounding the letter, especially regarding its authorship. Historians have argued for years that it was actually John Hay, Lincoln’s personal secretary (and future Secretary of State), who wrote the letter to Ms. Bixby.

In a paper recently submitted to the journal Digital Scholarship in the Humanities, Nini and his team say a new technique they developed leads them to believe the letter was “almost certainly” written by Hay, not Lincoln.

N-gram is a method commonly used in linguistics to ascertain authorship. An n-gram is a sequence of one or more elements including, but not limited to, words, characters, grammatical structures, etc.

But the traditional n-gram method only works on longer texts, typically at least 500 to 1,000 words; otherwise, the relative frequencies of the linguistics features cannot be estimated accurately. This posed a problem for the 139-word Bixby letter. But not for Nini.

Based on the general approach of the n-gram method, Nini adapted a new method, called n-gram tracing. This method looks at the presence of absence of n-gram sequences to establish authorship, rather than their numerical frequency.

“We looked at all of the n-grams size two, so all of the two-word combinations in the letter,” Nini explained to Forensic Magazine. “We checked whether these n-grams were present or absent in one author or the other. This method is new, it’s something we tried and saw it worked very well for solving this particular problem.”

Using 1,080 texts and 400,000 words as a control for Lincoln, and 577 texts with 260,000 words for Hay, the method identified Hay as the author of the Bixby letter 90 percent of the time, with the analysis being inconclusive in the rest of the cases.

That being said, Nini acknowledges more work needs to be done to verify n-gram tracing. The Bixby letter case was simple given that there were only two possible authors. Thus, he and his team are working right now to establish the upper limit of the method. They are taking small chunks, about 200 words, from famous novels to see if the method can correctly attribute the passages to one of up to 20 candidates.

“I wouldn’t say n-gram tracing works well all of the time,” Nini said. “For this case, it worked really well. The work we’re doing now is seeing how well it can work with all sorts of other constraints. We aim to get this done ASAP and publish the results to disseminate to the field. We’ll know [more] after we run enough tests.”

One mystery down, one to go

In 1888, London serial killer Jack the Ripper supposedly authored hundreds of letters about murders he committed, eventually sending the letters to police and others involved in the murder cases. Three letters in particular stand out. In the century-plus since, the prevailing theory is the actual serial killer did not write any of the letters—it was journalists trying to drum up newspaper circulation.

Nini has been working on the case for years, and just submitted a paper that is currently being reviewed for publication. He set out not to find out who authored the letters, but to see whether any of the letters could be linked to each other through the same author.

By applying a method similar to n-gram tracing, Nini believes he has “strong linguistic evidence that confirms two letters were written by the same person,” and the controversial third one was also authored by that individual.

“The key thing is the letters imitate the first famous letter in the Jack the Ripper case,” Nini said. “This letter was published in newspapers, so many hoaxers copied it and pretended to be Jack the Ripper. It’s a nice test case because we’ve got one or two original letters published in the papers, and then 190 hoaxers trying to pretend to be this person.

“It’s nice because we can test questions such as ‘to what extent can people imitate style?’,” he continued. “My analysis actually shows people are bad so [they] tend to imitate very surface things, so when it comes to deeper things, the imitation breaks down. That’s a very important finding for forensic linguistics in general.”

Editor's note: check back next week for an even deeper dive into the field of forensic linguistics, including an interview with the foremost forensic linguist in the U.S.