Scrapy FakeUserAgentError: Error occurred during getting browser

戏子无情 提交于 2019-12-24 05:09:09

问题


I use Scrapy FakeUserAgent and keep getting this error on my Linux Server.

Traceback (most recent call last):
  File "/usr/local/lib64/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
    response = yield method(request=request, spider=spider)
  File "/usr/local/lib/python2.7/site-packages/scrapy_fake_useragent/middleware.py", line 27, in process_request
    request.headers.setdefault('User-Agent', self.ua.random)
  File "/usr/local/lib/python2.7/site-packages/fake_useragent/fake.py", line 98, in __getattr__
    raise FakeUserAgentError('Error occurred during getting browser')  # noqa
FakeUserAgentError: Error occurred during getting browser

I keep getting this error on the Linux server when I run multiple spiders concurrently. This error rarely happens on my own laptop. What should I do to avoid that? Do I have to raise the RAM or something? The server's spec is 512MB RAM and 1 vCPU.


回答1:


I am not sure about RAM and why the error only happens on the Linux Server with a minimum spec. I solved it by using fake-useragent fallback feature. Sadly, scrapy-fake-useragent doesn't give any feature to set it conveniently, so I have to override the middleware feature on middlewares.py like this:

from fake_useragent import UserAgent
from scrapy_fake_useragent.middleware import RandomUserAgentMiddleware

class FakeUserAgentMiddleware(RandomUserAgentMiddleware):
    def __init__(self, crawler):
        super(FakeUserAgentMiddleware, self).__init__(crawler)
        # If failed to get random user agent, use the most common one
        self.ua = UserAgent(fallback='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36')
        self.per_proxy = crawler.settings.get('RANDOM_UA_PER_PROXY', False)
        self.ua_type = crawler.settings.get('RANDOM_UA_TYPE', 'random')
        self.proxy2ua = {}

Then I activate the middleware on settings.py like this:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    # 'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400, # disable the original middleware
    'myproject.middlewares.FakeUserAgentMiddleware': 400,
    # omitted
}

UPDATE

Try updating fake-useragent to version 0.1.5. I was using 0.1.4 and after upgrading, the problem is gone from the root, not by using fallback.




回答2:


Using fake_useragent 0.1.7 here, having same issue.

However I have fixed it for my server. Here is the issue ticket with my suggestion to bypass the error.

https://github.com/hellysmile/fake-useragent/issues/59

Hope that helps.



来源:https://stackoverflow.com/questions/43023805/scrapy-fakeuseragenterror-error-occurred-during-getting-browser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!