I have seen some captchas being decode using javascript, php, etc. How do they do it?
For example, very popular megaupload site\'s captcha has also been
I was involved in a project to circumvent Captcha images on the TicketMaster website about 8-9 years ago for a third-party ticket seller. When an event went on-sale, like a concert, our network of machines would use multiple credit cards and mailing addresses to buy any and every seat possible in the first 10 rows.
Rather than generating new captcha's each time, TM had a limited pool of images they could re-use. We'd create a unique digital fingerprint (checksum) for each image, then simply attack it with some imaging tools (LEADTOOLS.com) (to remove extraneous elements, enhance contrast, etc) and then use OCR tools. It was surprisingly effective.
We were able to crack a great number programmatically, and we'd store the ones we couldn't crack for human processing. Sometimes they'd have a pool of 20K images, so at first we'd get maybe 60-70% automatically, but eventually we'd get 100% success because we could identify the images our humans processed (offline) based on looking up their hash in our database. (That is, we could check a captcha image against our database based on the hash we created and if we already had the solution we could just submit the answer immediately.)
Occasionally, they'd flush and replace their pool of captcha image images with a new set, but again, it would just take us a bit of time to get back up to a 100% rate. The fatal flaw with this particular system was that they recycled images, rather than programmatically generating new captcha images each time.
But the fact is, if the financial incentive to crack the capthcha is high enough, it doesn't take much to create a distributed platform where low-wage unskilled workers can sit around earning pocket change to crack them all day.
Inside India's CAPTCHA solving economy http://www.zdnet.com/blog/security/inside-indias-captcha-solving-economy/1835