July 11, 2005


We've been looking into various anti comment-spam measures recently. Kieran has done good work implementing support for the Movable Type blacklist system, which means that we can now block much more comment spam than we used to. However, it's still not perfect; it's in the nature of these things that new variants of spam inevitably arrive slightly ahead of new updates to the blacklist file, so (as we've already seen) spam still slips through from time to time.

One other approach that's sometimes used to try and prevent spam is the Captcha, a graphic which shows some distorted letters and asks the user to type in the letters that they see; if the typed letters don't match the text in the image, then the comment is rejected. The theory is that since most spam is sent by computers running scripts, they won't be able to pass the captcha test, so they won't be able to post spam. At the same time as Kieran was looking at better blacklisting, I was looking into captchas. Here's what I learned:-

  1. I presumed "captcha" was just a catchy name for something that attempts to capture whether you're a human being or a script, but not so; it stands for Completely Automated Public Turing test to tell Computers and Humans Apart. Catchy. It's a trademark, too.

  2. One problem with captchas is that they are not in fact impossible for computers to solve. Various people have done work to demonstrate that captchas can be OCR'ed with very high success rates, though the success varies with the type of captcha. These people for example claim to be able to break the Yahoo captcha 92% of the time. Here's a list of common captchas with an assessment of their solvability.

  3. In general, making captchas harder for computers to solve also means making them harder to humans to read. Some people have done interesting work to make captchas which humans can read but computers would find difficult; here's an example of 3D lettering which (the authors believe) is human but not computer readable.

  4. Other people have come up with captcha variants which are not based around letters or numbers. Here's a captcha which draws shapes on a noisy background, here's one which asks you to click at the cente of distortion on a picture, and here's one which asks you to click on a shape which doesn't belong on its background.

  5. There's no strong evidence to suggest that spammers are actually using AI to solve captchas; it may in fact be cheaper just to use people to do the job; a human being can easily solve a hundred captchas an hour with a very high success rate.

  6. Cory Doctorow wrote about the whimsical idea that if you want a human to solve a captcha for you, you could do so by the ingenious approach of inviting people to come to your web site and offering them something they wanted – free porn or MP3s, say – and then protecting the content with a captcha which is in fact protecting some other site that you want access to. The visitor to your web site solves the captcha to get the porn, then you use the captcha to get whatever resource you wanted. It's ingenious, but again, there isn't any evidence to suggest that it's actually going on.

  7. Apart from being solvable in principle, if not in practice, the other big problem with captchas is that they aren't accessible. It's part of their design that they use tricky colour combinations, busy backgrounds, strange fonts and distorted lettering, so they are at best hard to read; if you're even slightly visually impaired, they quickly become unworkable for you. And since they're images, not text, you can't resize them or change their colour scheme.

  8. For that reason, some people have experimented with alternative means to protect their comment forms or sign-up forms or whatever. One clever chap has used Flash to make something which looks just like a comment form, but is in fact a little Flash applet which when you click on part of it submits the form together with a key which is buried inside the SWF file. The spambot has no idea that clicking on a special part of the Flash applet is how you submit this particular form, so wanders away disappointed.

  9. But my favourite, and in some ways most trivial variant on the captcha is one which is completely accessible and works like this; add one extra field to your comment box, and have a label next to it with a question such as "What is my name?" or "What colour is a banana?" or "What bird rhymes with carrot?". It's accessible, because the text can be resized, read out loud, etc. but it's hard for a bot to defeat because it isn't algorithmically solvable. Lots of people seem to have had this idea, but Eric Meyer's GateKeeper implementation is one that's often cited.

This last idea, it seems to me, is particularly well-suited to us because we could let everyone define their own question and answer, and make them as hard or as easy as they like. One special case of the Q&A system would be simply "Enter the password" with no clue as to what the password is; that way only people that you've actually told the password to can comment. And of course if bots start building up databases of common questions and answers, you can just change yours to something new.

If we can't beat spam by blacklisting, I like the QA idea.

Update 10th August: I'm tickled by this idea for asking users to enter large random numbers to help calculate pi. Nice blog design, too.

- 6 comments by 4 or more people Not publicly viewable

[Skip to the latest comment]
  1. Robert O'Toole

    I like the Q&A option. I could, for example, set a question on an academic entry with an answer that could only be known by someone with some expertise in that area.

    But it might seem a little odd to some readers. Perhaps if it were termed clearly as the "anti-spam question" they would get the idea?

    11 Jul 2005, 16:51

  2. I like the Q&A one as well. Could we also choose to ask our own questions? We could set questions to see if the people reading the entries had been paying attention!!

    I take it these extra measures are only going to be used on people making anonymous comments?

    11 Jul 2005, 20:14

  3. Steve Rumsby

    I remember the installation process for a piece of software stopping and asking a question, something like "What's the answer to this question?" The installation would not proceed without correctly answering the question. The answer was, of course, in the manual. Somewhere…

    11 Jul 2005, 22:37

  4. Yeah! I can remember that happening in the game 'Civilisation' (mac version I think) just before advancing a level. You'd get 3 questions along the lines of "this symbol represents which technological advancement?" then cue much flicking through the manual.

    I think it's a genius idea and, like Stephen and Robert, I'd love to be able to set my own question rather than have it randomly generated for example.

    12 Jul 2005, 10:49

  5. I think the question/answer solution looks interesting. You could combine that with an image captcha to make it more difficult for bots to build up a database.

    Perhaps generating some (light) arithmetic as an image and asking the user to input the answer would work well too.

    There are some interesting "features" you could get from allowing the blogger to set their own question, although a random question for each comment may be more secure

    22 Jul 2005, 13:24

  6. Nathan Sircombe

    …I noticed what looks like a Nigerian scam type post on one blog…


    …I guess this might be the kind of crap you're hoping to stop.

    02 Aug 2005, 09:37

Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Search this blog


Blog archive


Most recent comments

  • I'm looking for two authors/books from the 1970s or early 80s. The first was set in England. All I r… by Leo on this entry
  • I'm looking for two authors/books from the 1970s or early 80s. The first was set in England. All I r… by Leo on this entry
  • I am trying to find a book about a grandfather who tells his grandson that if he imagines hard enoug… by hilary woolf on this entry
  • Hi Looking for a series of books in which the main character was a knight. The knight was either a f… by Ely McKenna on this entry
  • I'm trying to track down my favourite children's book from the early 1970s (pub. 1970 – 1973?) about… by Ally Holloway on this entry
Not signed in
Sign in

Powered by BlogBuilder