Some websites block selenium webdriver, how does this work?

ε祈祈猫儿з 提交于 2019-12-12 18:44:13

问题


So I'm trying to web crawl clothing websites to build a list of great deals/products to look out for, but I notice that some of the websites that I try to load, don't. How are websites able to block selenium webdriver http requests? Do they look at the header or something. Can you give me a step by step of how selenium webdriver sends requests and how the server receives them/ are able to block them?


回答1:


Selenium uses a real web browser (typically Firefox or Chrome) to make its requests, so the website probably has no idea that you're using Selenium behind the scenes.

If the website is blocking you, it's probably because of your usage patterns (i.e. you're clogging up their web server by making 1000 requests every minute. That's rude. Don't do that!)

One exception would be if you're using Selenium in "headless" mode with the HtmlUnitDriver. The website can detect that.




回答2:


It's very likely that the website is blocking you due to your AWS IP. Not only that tells the website that somebody is likely programmatically scraping them, but most websites have a limited number of queries they will accept from any 1 IP address. You most likely need a proxy service to pipe your requests through.



来源:https://stackoverflow.com/questions/40750049/some-websites-block-selenium-webdriver-how-does-this-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!