Back when commuting was a thing, nearly 200 New Yorkers would cram themselves into a subway car during rush hour. Say you are one of those 200 and you take a quick glimpse around the car before it arrives at your station. Now, imagine having the ability to connect one of those faces with grainy CCTV footage or old mugshots available on a police database.
It’s not a common ability, but it does exist. Super recognizers, as they are called, are people who possess significantly better-than-average face recognition ability, often able to memorize and recall thousands of faces having only seen them once. London’s police force employs a unit of super recognizers, but they are still uncommon and rare.
Since 2018, National Institute of Standards and Technology (NIST) researcher Jonathan Phillips and his multi-institutional team have been delving into the accuracy of super recognizers, forensic examiners and facial recognition algorithms. Phillips’ most recent work, which he presented last week at the virtual [email protected] conference, indicates there is no statistical difference among examiners and super recognizers. And while both are good at their jobs, fusing human examiners with a facial recognition algorithm provides the best outcome.
For the study, Phillips identified four distinct subject groups—facial forensic examiners, super recognizers, fingerprint examiners with no face experience and undergraduate students. The research team also chose four neural network-based algorithms to examine 20 pairs of face images that were prescreened to be “extremely challenging.”
While the undergraduate group was brought in as a general population proxy, the inclusion of forensic fingerprint examiners was an interesting choice.
“It could be that anyone who is used to doing pattern matching, or has experience and training with pattern matching in forensics, could perform as well as examiners. So, we used them as a forensic control group,” Phillips explained during his presentation.
Statistical analysis on the study results revealed examiners and super recognizers performed the same as a group. Additionally, examiners performed better than fingerprint examiners, indicating face training and experience is indeed important. With the exception of the student group, the three remaining groups has at least one person who scored perfect on the test.
The algorithms followed a similar trend line. Algorithm A2015 was comparable to the average student, A2016 was comparable to fingerprint examiners, A2017a was comparable to media super recognizers and A2017b was comparable to the best face examiners.
The second part of the study combined the performance of human examiners and algorithms to quantify results from that unique perspective. To do this, Phillips and team averaged the scores between examiners and algorithms in a process called “fusion.”
“The reason we picked fusion was because there is classic work in computer vision where just taking the average gets you where you need to go,” explained Phillips. “Also, in our case, it doesn’t risk overgeneralization on a relatively small set of data so we are making few assumptions.”
One examiner fused with A2017b—which performed best initially—illustrated the best results. However, the combination of one examiner with A2017a and one examiner with A2016 performed statistically the same as two human examiners working together.
“The results have been of interest in the community in terms of how to integrate algorithms and examiners in the process, and then have examiners write the report and testify in court that this is how they combined the two,” Phillips said.
Based on the study results, Phillips and his team are now looking executing a deeper understanding of examiners and the qualities they possess. For example, are examiners better when they look at a face quickly or if they study it over time? Additionally, how do comparisons across races impact accuracy? What about disguises, or even face masks in the era of COVID-19?
The team is also looking into the need for different sets of face recognition tests based on participant ability, as well as what differentiates examiners and super recognizers from each other and the general population.
Photo: A violin plot showing the results of Jonathan Phillips’ study on the performance of face examiners, face reviewers, super recognizers, students and four algorithms.