The NSA is processing millions of facial images from intercepted communications as part of a program to build a global identity database of persons of interest, documents from the Edward Snowden cache seen by the New York Times have revealed.

The number of images said to have passed through the system ran into millions per day, the documents said, 55,000 of which were “facial recognition quality images” [i.e. able to recognise individuals from various angles] which the NSA document praises for their “tremendous untapped potential.”

This was a document from 2011 discussing an older program, facial recognition technology has moved on a bit, and the state and size of this system in 2014 is anyone’s guess. Whether the potential has been tapped is a matter of conjecture but the size of the program is likely to have grown since then.

“It’s not just the traditional communications we’re after: It’s taking a full-arsenal approach that digitally exploits the clues a target leaves behind in their regular activities on the net to compile biographic and biometric information” that can help “implement precision targeting,” said a document quoted by the NYT, perfectly summarising the principle at work.

It’s the sort of revelation that could easily be misunderstood as a general snooping on the pictures and videos posted every day by ordinary members of the public, but a deeper dive into the NYT report suggests that the program, started around 2010, is far more targeted.

The NYT reports makes clear that the image analysis program had been accelerated after Nigerian Umar Farouk Abdulmutallab tried to blow up a Detroit-bound plane in 2009 and the attempted car-bombing of Times Square in May 2010 by Faisal Shahzad.

From a 2014 perspective, it would actually be more surprising of the NSA and FBI weren’t doing this sort of analysis. The bigger question is probably less what the system is doing than who it is doing it to.

The NYT story suggests that the images are not ‘found’ on the Internet so much as eavesdropped from intercepts, including videoconferences, foreign databases of individuals and airline data. These images become more useful and significant because they can be related to specific communications and events.

It was this cross-referencing – the ability to connect an image or images of the same person in apparently different guises – that had led to the analytical use of images by the NSA. As of 2011, the NSA appeared able in some circumstances to pinpoint where the images were taken using satellite maps.

On the other hand, in February the US and UK intelligence agencies were revealed through separate Snowden documents to have collected 1.8 million webcam images from Yahoo users as far back as 2008 through a program called ‘Optic Nerve’. That system also used image recognition and appeared to be grabbing every image it could get hold of, regardless of whether the target was under surveillance or not.   

The legitimate concern isn't that the NSA cares about every picture of a person with their pet cat - these are after all deliberately made public - but that they are now building a system capable of relating that image to every other one of the same person, including those gathered officially.

Given that the NSA is said to be using commercial technology, a good guess as to its future capabilities will be what is possible in that sector. A good example of what is now possible is an image-recognition forensic system from NetClean, which recently diversified from spotting Internet child porn into general policing tool able to do some of what the NSA appears to have been investigating.

In conclusion, it isn't just the NSA that is looking into image collection ana analysis; many police forces are now doing the same and there will be no holding this back.