How to bypass cloudflare bot/ddos protection in Scrapy?

后端未结

关注

 3  1009

I used to scrape e-commerce webpage occasionally to get product prices information. I have not used the scraper built using Scrapy in a while and yesterday was tryi

相关标签:

3条回答

庸人自扰

2020-12-13 14:35
If you're getting 503 Error you can follow these guidelines:
1. Go to settings.py
2. Search for: USER_AGENT
3. Here you will see the default bot user agent by scrapy. Replace that default with this:
```
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-13 14:42
So I executed JavaScript using Python with help of cloudflare-scrape.

To your scraper, you need to add the following code:
```
def start_requests(self):
  for url in self.start_urls:
    token, agent = cfscrape.get_tokens(url, 'Your prefarable user agent, _optional_')
    yield Request(url=url, cookies=token, headers={'User-Agent': agent})
```
alongside parsing functions. And that's it!

Of course, you need to install cloudflare-scrape first and import it to your spider. You also need a JS execution engine installed. I had Node.JS already, no complaints.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-12-13 14:42

Obviously the best way to do this would be to whitelist your IP in CloudFlare; if this isn't suitable let me recommend the cloudflare-scrape library. You can use this to get the cookie token, then provide this cookie token in your Scrapy request back to the server.

0 讨论(0)
发布评论:

提交评论
- 加载中...