How many of you know the reCAPTCHA? How many of you know what exactly reCAPTCHA is..? probably a lot of people don’t know for what reason the reCAPTCHA is being used everywhere on the web nowadays. Most of the time the reCAPTCHA is used in the registration or login process.
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) project is started 10 years ago and the next evolution of CAPTCHA is reCAPTCHA which is acquired by Google.
About reCAPTCHA
The basic aim of reCAPTCHA is to detect that the person that is sitting on the web or surfing a particular website is a Human being and not a machine program.
Because a machine-generated program can send millions of HTTP requests at a time so that it will increase the spamming to the website upon which the user is registering or login in.
Consider an example of a ticket selling website, if the administrator creates a registration form for tickets then a machine or computer-generated program may generate millions of ticket booking HTTP requests at a time which will probably increase the spam.
Basically, CAPTCHA generates the sequence of random characters for the user. Daily 200 million CAPTCHA’s are typed by the users and 10 second approx time is required by each user to type the CAPTCHA so calculating the overall time humans are wasting the 500 thousand hours every day.
So for what reason this time is being wasted..??? No….!! Actually, this time is not a “WASTE”….this efforts of humans are utilized for a great purpose explained ahead. When the user is typing the CAPTCHA his brain is doing something amazing, which can’t be done by computers yet.
So in these 10 seconds chunks, each user is solving a problem that is not yet solvable by computers. And you know what that problem is which you are solving out..?? …. it is book digitization..!! So the user is not only getting authenticated as a human but in addition to that, you are helping to digitize the books.
Book Digitization
Now let me explain what exactly book digitization is..! There are different projects going on book digitization like google, amazon, etc. Basically, they start with the old physical book and scan it, now here, scanning means taking the digital photograph of that book, i.e. image for every page of a book.
The next step
is that the computer identifies the words and diagrams from this image and this technology is called OCR (Optical Character Recognition) which takes the picture of the page and tries to figure out what the is in the text.! Now the problem is that OCR is not perfect.
When it scans the faded or yellow pages of old books, OCR fails to recognize the correct word. The approximate extent of this is OCR can’t recognize 30% of words of books that are older than 50 years.
Now what they exactly do is that they take all the words that computers can’t recognize and gives those words to users which are the words shown in the CAPTCHA. Means the word that you are typing in a CAPTCHA are the word those came from the book which is not recognized by the computer.
Actually, there are 2 words appearing in each CAPTCHA out of that one is the word that is just taken from the book which is not recognized by the computer and another word is the word that does recognize by the system. These two words can be switched randomly.
So these 2 words are given to 10 to 15 CAPTCH’s at a time and if a large number of them are agreed on what the new word is then that word is stored as the recognized word. So this is the process of reCAPTCHA.
Click Here to know more about reCAPTCHA.
Fore more articles please Click Here.