I used to scrape e-commerce webpage occasionally to get product prices information. I have not used the scraper built using Scrapy in a while and yesterday was tryi
If you're getting 503 Error you can follow these guidelines:
settings.py
USER_AGENT
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
So I executed JavaScript using Python with help of cloudflare-scrape.
To your scraper, you need to add the following code:
def start_requests(self):
for url in self.start_urls:
token, agent = cfscrape.get_tokens(url, 'Your prefarable user agent, _optional_')
yield Request(url=url, cookies=token, headers={'User-Agent': agent})
alongside parsing functions. And that's it!
Of course, you need to install cloudflare-scrape first and import it to your spider. You also need a JS execution engine installed. I had Node.JS already, no complaints.
Obviously the best way to do this would be to whitelist your IP in CloudFlare; if this isn't suitable let me recommend the cloudflare-scrape library. You can use this to get the cookie token, then provide this cookie token in your Scrapy request back to the server.