Re-Capturing Books through Captcha…

A few days ago, Ed Warkentin, of Teach ‘Em How to Fish, sent me an e-mail describing something called reCAPTCHA.  A captcha is a string of letters and numbers twisted just a bit, and displayed when you want to post a comment on a blog, or submit some other information through a web page.  If you type the letters and numbers in correctly, then the software assumes that you are probably a human, and not an automated spambot, seeking to populate the Internet with that noxious waste and

reCAPTCHASo, according to the reCAPTCHA web site, about 60 million of these captchas are read and typed in by humans every day. 

Enter the problem, not with captchas, but with our ongoing efforts to digitize the pre-digital record of humankind. 

Volunteers around the world are working to type in the great books, and not so great works that define hour heritage.  They work very hard and they have a long way to go.

So, the School of Computer Science at Carnegie Mellon University takes a look and offers something interesting.  With the help and support of Intel, Novell, The MacArthur Foundation, and ALADDIN/Olympus, and a variety of Open Source applications, Luis von Ahn, Ben Maurer, Mike Crawford, Ryan Staake, and Manuel Blum have devised a way to help us, those who comment on blogs, to help them digitize books of old.  Each time you click to comment on a participating blog, the service delivers an altered photograph of two words from the work currently being digitized.  You read the two words, type them in, and then submit your comment.  The words are compared with what others have typed in, and if they agree, then the words are added to the work, and your comment is posted.  We are digitizing the great works — two words at a time.

The only thing that intrigues me more than this kind of harnessing of computer and human intellectual labor, are the people who think this stuff up.

7 thoughts on “Re-Capturing Books through Captcha…”

  1. This perfectly illustrates one of the core themes of Don Tapscott’s book Wikinomics: collaboration over Web 2.0, or whatever else you want to call it.

    Even mindless collaboration, such as entering two seemingly random words into a text field in order to leave a comment on a blog, can create a work force that is as powerful and as accurate as a force of professional transcribers, and the collaborative group is so much cheaper. This is the same new way of generating information that has allowed Wikipedia to surpass Britannica in scope and rival it in accuracy and authority, all within a few years.

    I’m confident that more people will discover ways to leverage the energy of billions of keystrokes and clicks that pour daily into the Web. Now, how can we harness this energy in school? Hmm.

  2. This is an absolutely brilliant idea–not only good for digitizing the books, but also for generating blog comments. I’m more likely to leave a comment on a blog that uses reCaptcha because I know I’m contributing to a worthy cause. My question is this, though: if the books need digitizing, how did they get the words into reCaptcha? Didn’t someone or some machine already have to digitize them to get them into the reCaptcha system? I’ll have to visit the reCaptcha site to find out.

    Nick

  3. Wow! I’m leaving a comment just to try reCaptcha! also, I enjoy your posts about education and technology 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *