How to bypass Google captcha with Selenium and python?

别来无恙 提交于 2019-12-29 10:02:11

问题


I want to know how to bypass google captcha using selenium and python

When i try to scrap something google give me captcha, can I bypass google captcha with selenium python.

As an example it's google recaptha you can see this captcha via this link: https://www.google.com/recaptcha/api2/demo


回答1:


To start with using Selenium's Python clients you should avoid solving/bypass google captcha.


Selenium

Selenium automates browsers. Now what you what to achieve with that power is entirely up to individuals but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.


Captcha

On the other hand, Captcha (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.

So, Selenium and Captcha serves two completely different purpose and ideally shouldn't be used to achieve any interrelated tasks.

Having said that, recaptcha can easily detect the network traffic and identify your program as a Selenium driven BOT.


Generic Solution

However there are some generic approaches to avoid getting detected while web-scraping:

  • The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
  • To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

This usecase

However in a couple of usecases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:

  • How to click on the reCaptcha using Selenium and Java
  • CSS selector for reCaptcha checkbok using Selenium and vba excel
  • Find the reCAPTCHA element and click on it — Python + Selenium

References

You can find a couple of related discussion in:

  • How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
  • Is there a version of selenium that is not detectable ? can selenium be truly undetectable?

tl; dr

  • How does recaptcha 3 know I'm using selenium/chromedriver?



回答2:


In order to bypass the captcha when scraping Google, you have to manually solve a captcha and export the cookies Google gives you. Now, every time you open a Selenium Webdriver, make sure you add the cookies you exported. The GOOGLE_ABUSE_EXEMPTION cookie is the one you're looking for, but I would save all cookies just to be on the safe side.

If you want an additional layer of stability in your scrapes, you should export several cookies and have your script randomly select one of them each time you ping Google.

These cookies have a long expiration date so you wouldn't need to get new cookies every day.

For help on saving and loading cookies in Python and Selenium, you should check out this answer: https://stackoverflow.com/a/15058521/1499769

Hope this helps!



来源:https://stackoverflow.com/questions/58872451/how-to-bypass-google-captcha-with-selenium-and-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!