Eyewitnesses commonly help investigators identify potential suspects, but “ear” witnesses who hear a perpetrator but do not see them may also be helpful in criminal investigations. A certain spoken phrase remembered by a witness or victim, or an audio recording of a ransom call or a crime taking place, could be valuable evidence when other identifying information about a suspect is unknown. Investigators have also used automatic systems to analyze and compare voices—but when a speaker disguises their voice, verifying their identity can become much more difficult.

Researchers from the University of Eastern Finland recently conducted a study, published in the journal Speech Communication, to compare the accuracy of automatic speaker verification (ASV) systems and human listeners when attempting to recognize voices disguised to sound older or younger. The researchers also examined how this voluntary voice manipulation changed the fundamental characteristics of the sound produced by the speakers.

Their tests revealed that the performance of human listeners correlated with the performance of the automatic system, as trials deemed “easy,” “intermediate” or “difficult” by the automatic system lined up with the number of errors made by human subjects attempting the same trials.

On average, listeners made 8.23 errors out of 24 attempts, while a listener “panel” created by combining the answers of all participants for each trial made a slightly better eight errors out of 24 trials. The eight trials scored as “easy” by the automatic system resulted in a total of 42 errors across all 70 participants (eight attempts per participant—560 attempts total), while the eight “intermediate” trials resulted in 230 errors and the eight “difficult” trials resulted in 306 errors.

The experiment, which mostly included phrases spoken in Finnish, showed that the accuracy of both native Finnish listeners and non-native Finnish listeners decreased significantly when attempting to identify disguised voices versus non-disguised voices. Non-native listeners gave more false positives than native Finnish listeners did, though the authors concluded that being a native speaker was not a substantial help in recognizing disguised voices. Furthermore, the authors found no particular pattern in listeners’ abilities to recognize voices based on the listener’s age, sex, musical training, or education and work in linguistics.

In performing an acoustic analysis of the sound produced by the speakers when disguising their voices, the researchers found that the fundamental frequency of the speakers’ voices tended to increase with both the young voice and old voice disguises. Previous research noted by the authors showed that it was easier for subjects to increase their fundamental frequency, creating a higher pitch, than it was for them to decrease it. This voice disguise “strategy” observed by the researchers could inform further examination of the common tactics used by those attempting to conceal their identity.

“A step forward towards more robust speaker verification against voice disguise, whether performed by humans or ASV systems, would be to consider the vocal parameters that are more commonly modified by speakers avoiding identification,” the authors wrote.