Blog

CAPTCHA

16 years and 10 months ago · listen

CAPTCHA is an acronym for "completely automated public Turing test to tell computers and humans apart", and is a type of test used to determine whether or not the user is human. There are several different methods, and the purpose of this post is to list and evaluate some of them, as point to some forms of circumventing CAPTCHAs.

Applications

CAPTCHAs are used to prevent bots from using various types of computing services. Applications include preventing bots from taking part in online polls, registering for free email accounts (which may then be used to send spam), and, more recently, preventing bot-generated spam by requiring that the (unrecognized) sender pass a CAPTCHA test before the email message is delivered.

There are different methods for implementing CAPTCHAs: they can be based on text, images, sound or logic puzzles. What they all have in common is the assumption that people will be able to solve them very quickly, but computers won't. Next, I will list some examples of different CAPTCHAs, and evaluate them on security and accessibility issues.

Test based CAPTCHAs

A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen:

In order to prevent computer recognition, CAPTCHAs have to include a lot of background noise, and play with the font and rotation of the letters in it, which can generate very hard to read text, even for humans:

In order to help people understand the text in the CAPTCHAs, some implementations use words taken from a dictionary. This helps people understand the hard-to-read letters in the text, but makes the CAPTCHA weeker to computing attacks, since the number of possible words is finite:

The Gimpy method

Gimpy works by choosing a certain number of words from a dictionary, and then displaying them corrupted and distorted in an image; after that Gimpy asks the user to type some of the words displayed in that image. In the example below, the user only needs to type 3 words from the 7 that exists in the CAPTCHA:


Click here for a live demo

Image based CAPTCHAs

A different approach is to use pictures to make human checking, since computers have a great difficulty to look at a picture and understand what's in it. The problem with this kind of approach is, again, the finite number of images used: depending on how many images are beeing used to implement the CAPTCHA, the attacker can grab all the images used in it, and "teach" his computer about them. So, people are trying to use free available pictures from services as Flickr or Zooomr to avoid this.

PIX

PIX is a program that has a large database of labeled images. All of these images are pictures of concrete objects (a horse, a table, a house, a flower, etc). The program picks an object at random, finds 4 random images of that object from its database, distorts them at random, presents them to the user and then asks the question "what are these pictures of?".


Click here for a live demo

The Kittens test

This is probably the cutest CAPTCHA I have seen: the user is confronted with a 3x3 grid of animal pictures, and 3 of them are of kittens. The user must click the kittens:


Click here for a live demo

Audio based CAPTCHAs

CAPTCHAs based on reading text - or other visual-perception tasks - prevent visually impaired users from accessing the protected resource. However, CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA. Take a look at this audio based CAPTCHA test to see how it works.

Circumvention

Since there is nothing 100% secure, it's always possible to break a CAPTCHA, and that can be done in two ways: using or not using humans to do it.

One ingenious crack is to offer a free porn site which requires the user to key in the solution to a CAPTCHA - which has been inlined from Yahoo or Hotmail - before the user can gain access. Free porn sites attract lots of users around the clock, and the spammers are able to generate CAPTCHA solutions fast enough to create as many throw-away email accounts as they wanted.

The other way to crack CAPTCHAs is using OCR (optical character recognition) technology. The most well known project for defeating CAPTCHAs must be the PWNtcha, and a visit is mandatory, in order to understand which type of CAPTCHAs are most vulnerable.

Implementing CAPTCHAs

If you want to implement a CAPTCHA in any of the following programming languages - ASP, C, ColdFusion, Java, .NET, Perl, PHP, Python, Ruby or Smalltalk - check the Wikipedia link.

Resources