Scrapy Python Set up User Agent

被刻印的时光 ゝ 提交于 2019-12-17 15:53:42

问题


I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here is the code:

[settings]
default = myproject.settings
USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"


[deploy]
#url = http://localhost:6800/
project = myproject

But when I run the crawler against my own web, I notice the spider did not pick up my customized user agent but the default one "Scrapy/0.18.2 (+http://scrapy.org)". Can any one explain what I have done wrong.

Note:

(1). It works when I tried to override the user agent globally:

scrapy crawl myproject.com -o output.csv -t csv -s USER_AGENT="Mozilla...."

(2). When I remove the line "default = myproject.setting" from the configuration file, and run scrapy crawl myproject.com, it says "cannot find spider..", so I feel like the default setting should not be removed in this case.

Thanks a lot for the help in advance.


回答1:


Move your USER_AGENT line to the settings.py file, and not in your scrapy.cfg file. settings.py should be at same level as items.py if you use scrapy startproject command, in your case it should be something like myproject/settings.py




回答2:


Just in case anyone lands here that manually controls the scrapy crawl. i.e. you do not use the scrapy crawl process from the shell...

$ scrapy crawl myproject

But insted you use CrawlerProcess() or CrawlerRunner()...

process = CrawlerProcess()

or

process = CrawlerRunner()

then the user agent, along with other settings, can be passed to the crawler in a dictionary of configuration variables.

Like this...

    process = CrawlerProcess(
            {
                'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
            }
    )



回答3:


I had the same problem. Try running your spider as superuser. I was running the spider directly with the command "scrapy runspider", when I just tried executing it with "sudo scrapy runspider" it worked.



来源:https://stackoverflow.com/questions/18920930/scrapy-python-set-up-user-agent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!