When discussing inappropriate image detection, the most common question posed is “What about false positives?”. In this article, Andy Churley, of PixAlert, explains the issue of false positives in simple terms.
When discussing inappropriate image detection, the most common question posed is “What about false positives?”. In this article, Andy Churley, of PixAlert, explains the issue of false positives in simple terms.
What is a false positive?
One definition of a False Positive is: “A positive test result in a subject that does not possess the attribute for which the test is being conducted.” Relating this definition to inappropriate images, typically digital pornography, a false positive refers to a benign image, which is incorrectly reported as containing inappropriate content.
Conversely a ‘False Negative’ refers to an image containing inappropriate content, which is not identified as inappropriate and therefore is not reported. When using software to scan file systems for inappropriate images, auditors are usually more concerned with false positives than false negatives for one simple reason - they can see and measure the false positives because they have been detected. False negatives on the other hand are images which have been missed by the scanning software and therefore do not get presented to the auditor.
The false positive / false negative balance
When creating software applications to find inappropriate images on computer systems, the application developer has to deal with the False Positive versus False Negative balance, such that the software only presents a human auditor with images that are likely to contain inappropriate content yet at the same time does not ignore images containing inappropriate content. At one extreme, an application could return every image encountered for the auditor to review. While this would definitely capture all inappropriate images, this is clearly unmanageable since the auditor would have to review every image on a computer system.
At the other extreme, an application might only return images which are definitely inappropriate. While this would greatly reduce the number of images viewed by the auditor, a large number of inappropriate images would not be reported by the application if there is any uncertainty at all. Therefore most applications try for a ‘happy medium’ whereby they expect return a manageable number of false positives but will also capture almost all the inappropriate images.
The realities of image analysis
When setting out to determine whether or not a network contains inappropriate images, it is important to recognize that:
• no software algorithm is 100 percent accurate;
• images are stored in hundreds of files types;
• a high proportion of files residing on the network will be capable of containing images;
• a human auditor will have to view and classify the suspect images.
In most cases, there will be a large number of suspect images which need to be reviewed by an auditor many of which will be false positives. Therefore it is essential to be able to:
• Identify false positives quickly;
• Clear false positives en masse;
• Ensure the same false positives never appear in future scans.
PixAlert’s Approach
PixAlert’s image algorithms are tuned to look for multiple facets of an image and make a decision based upon statistical likelihood values that an image contains inappropriate content. In the case of digital pornography, PixAlert’s image algorithms look for key elements in seven facets of an image: Such comprehensive analyses result in a published accuracy rate of around 95%
Using Image Analysis on corporate networks
In order to calculate the false positive rate the following equation is used: The best commercially available image algorithms have about a 5% False Positive rate. However, PixAlert has developed some image detection improvement techniques which give, on average, only a 1.03% False Positive rate. More relevant to an auditor are the following approximations:
• For every 5 files encountered, 1 file will hold an image;
• For every 85 images analysed, 1 suspect image will be returned to the application for review;
• For every 9 suspect images viewed, 1 will be inappropriate.
This means that when using PixAlert Auditor, a reviewer should expect to view 8 false positives for every illicit image in the gallery. However, it is crucial to remember that for every image sent to the gallery for review, 85 images have already been discarded by the application without a reviewer having to view them.
<- Back to: ResourcesIllicit Image Detection | Critical Data Protection | Employee Email Monitoring | Data Loss Prevention | Data Leakage Protection
Rapid detection of illicit images and critical data loss prevention. Protect data with reliable monitoring of employee email and protect network.