Scrapy Shell - How to change USER_AGENT

妖精的绣舞 提交于 2020-11-30 06:26:05

问题


I have a fully functioning scrapy script to extract data from a website. During setup, the target site banned me based on my USER_AGENT information. I subsequently added a RotateUserAgentMiddleware to rotate the USER_AGENT randomly. This works great.

However, now when I trying to use the scrapy shell to test xpath and css requests, I get a 403 error. I'm sure this is because the USER_AGENT of the scrapy shell is defaulting to some value the target site has blacklisted.

Question: is it possible to fetch a URL in the scrapy shell with a different USER_AGENT than the default?

fetch('http://www.test') [add something ?? to change USER_AGENT]

Thx


回答1:


scrapy shell -s USER_AGENT='custom user agent' 'http://www.example.com'




回答2:


Inside the scrapy shell, you can set the User-Agent in the request header.

url = 'http://www.example.com'
request = scrapy.Request(url, headers={'User-Agent': 'Mybot'})
fetch(request)


来源:https://stackoverflow.com/questions/25429671/scrapy-shell-how-to-change-user-agent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!