Both simple and complex backgrounds with a variety of disguises. Photo: Singh et. all

Facial recognition and identification is hard enough. Throw in suspects donning sunglasses, hats, scarves, wigs, glasses, etc., and it becomes a completely different animal.

Current state-of-the-art methods boast a 78 to 81 percent accuracy rating. But researchers from the University of Cambridge (UK), the National Institute of Technology (India) and the Indian Institute of Science have proposed and tested a new platform that can achieve an 85 percent accuracy rating.

The new method, called Spatial Fusion Convolutional Network, is two-fold. First, the framework works to detect 14 facial “key points” that literature has identified as essential to facial identification. The novel element is the second part—the introduction of two brand new annotated facial disguise datasets.

The databases generally used for disguise-related research contain a small amount of images with limited disguise variations. The two databases developed and proposed by the researchers, however, contain 2,000 images each with “simple” and “complex” backgrounds. The enlarged image databases were necessary since the researchers’ framework is based off a deep convolutional network—a generative graphical model in machines learning that comprises multiple layers of latent variables with connections between the layers, but not between units within each layer.

The network then “learned” from the photos in the dataset. The 14 facial key points are connected to form a star-net structure. The orientations between the connected points are then aligned by the Spatial Fusion Convolutional Network—placing facial key points of all neighboring frames into one specific frame. By scanning across these neighboring images, the network can estimate the facial key points for a target image.

According to the paper, this type of deep convolutional network makes the predictions more accurate than other networks.

In the study, the framework’s average key point detection accuracy as 85 percent for images with a simple background, and 74 percent for those with a complex background.

The highest detection accuracy achieved was 90 percent, based on a cap disguise and simple background dataset.

Others are as follows on a simple background:

  • Scarf: 77 percent
  • Cap and scarf: 69 percent
  • Cap, scarf and glasses: 55 percent

The authors acknowledge, however, that the accuracy takes a substantial hit when the background is complex.

Those accuracy percentages are:

  • Cap: 83 percent
  • Scarf: 67 percent
  • Cap and scarf: 56 percent
  • Cap, scarf and glasses: 43 percent

The authors attribute the fall to the network failing to detect critical key points in the outer region of the face, due to the interference of background clutter.

So while there is still work to be done, the framework established in the paper is shown to outperform current methods.

And no matter what, the large number of images and disguises introduced in the new datasets will help improve the future training of deep learning networks as the technology continues to mature.