Thursday, April 30, 2009

Auto-Identify Porn

Steve Hanov is a Computer Science major who put together a short piece on the automatic identification of pornographic images, for which there is considerable demand. Companies want to block their workers from it, both because they want their workers to focus on the job, and because such images run them afoul of sexual harassment laws. People with delicate sensibilities also wish to either be protected from it themselves, or prevent their children from seeing it, and get filtering software for exactly that purpose.
When I entered school, there was a method for this in existence already. All images had their tones compared with a chart of human skin tones. If more than, say 40% of the image was skin toned, the image was classified as pornographic. (And blocked, sent to an admin, or whatever.) Just one problem with that. Here's a picture of a famous man. See if you can see the problem:
William Gates the 3rd, up and close
Most of the picture depicts his (face) skin, but this picture is clearly not pornography. So that method is likely to backfire occasionally.
So another expert, David Forsythe, wrote a paper on this topic, that he called Finding Naked People. (Mr. Forsythe is known for exactly this kind of humor.) Mr. Forsythe noted that people have at most two arms and two legs, so if it can trace a continuous skin-tone from a face, to an arm, to a leg, then this is probably a picture of a naked person and therefore not allowed. His program understands how human bodies work, but would be fooled by, say, a person wearing a body-stocking. Still, it was much better than the previous example that got fooled by faces.
So James Ze Wang wrote another program, WIPE, that carefully checked shapes, on the grounds that this would cut down on the false positives. It also checked five criteria, giving a slight more granularity to the process. After all, a person wearing a bathing suit would prompt a less serious response than a naked person, which in turn would be less than a naked person having sex and so on.
Unfortunately for Mr. Wang, WIPE had far too many false negatives. It approved images that were clearly to a human, pornographic. WIPE does make a good secondary filter with human intervention (that is to say, where a human double checks WIPE's approvals.)
Mr. Hanov then goes on to describe Google's involvement in the problem. When Google developed an image-search program, it then had to ensure that pornographic images were not returned unless explicitly asked for. After all, families and other sensitive people use it. The Internet already has a filthy reputation, no need to make it worse. Just imagine.

Mom: Hey Mad Engineer, I'm doing a report on oral cancer, can you get me a picture to go with that?
Mad Engineer: Sure mom. Open google, image search, "Oral cancer..." ....Oh my.
Grandma: What are you kids up to OH SWEET JESUS BURN THE DEMON MACHINE!!!!!!

Since hundreds of thousands of people use Google image every minute or so, this has to be a fairly fast and effective algorithm. And you know that if it fails that people will complain. Google writes another paper on their results.
I predict more developments on this in the future.

No comments:

Related Posts Plugin for WordPress, Blogger...