How to use scrapy.log module with custom log handler?

坚强是说给别人听的谎言 提交于 2019-12-19 09:17:21

问题


I have been working on a Scrapy project and so far everything works quite well. However, I'm not satisfied with Scrapy's logging configuration possibilities. At the moment, I have set LOG_FILE = 'my_spider.log' in the settings.py of my project. When I execute scrapy crawl my_spider on the command line, it creates one big log file for the entire crawling process. This is not feasible for my purposes.

How can I use Python's custom log handlers in combination with the scrapy.log module? Especially, I want to make use of Python's logging.handlers.RotatingFileHandler so that I can split the log data into several small files instead of having to deal with one huge file. The documentation of Scrapy's logging facility is not very extensive, unfortunately. Many thanks in advance!


回答1:


you can log all scrapy logs to file by first disabling root handle in scrapy.utils.log.configure_logging and then adding your own log handler.

In settings.py file of scrapy project add the following code:

import logging
from logging.handlers import RotatingFileHandler

from scrapy.utils.log import configure_logging

LOG_ENABLED = False
# Disable default Scrapy log settings.
configure_logging(install_root_handler=False)

# Define your logging settings.
log_file = '/tmp/logs/CRAWLER_logs.log'

root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
rotating_file_log = RotatingFileHandler(log_file, maxBytes=10485760, backupCount=1)
rotating_file_log.setLevel(logging.DEBUG)
rotating_file_log.setFormatter(formatter)
root_logger.addHandler(rotating_file_log)

Also we customize log level (DEBUG to INFO) and formatter as required. To add custom logs inside you spider, pipeline we can easily do it like a normal python logging as follows:

Inside pipelines.py

import logging
logger = logging.getLogger()
logger.info('processing item')

Hope this helps!




回答2:


You could integrate a custom log file like so (i'm not sure how to integrate the rotator):

In your spider class file:

from datetime import datetime
from scrapy import log
from scrapy.spider import BaseSpider

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://www.example.com/"]

    def __init__(self, name=None, **kwargs):
        LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
        # remove the current log
        # log.log.removeObserver(log.log.theLogPublisher.observers[0])
        # re-create the default Twisted observer which Scrapy checks
        log.log.defaultObserver = log.log.DefaultObserver()
        # start the default observer so it can be stopped
        log.log.defaultObserver.start()
        # trick Scrapy into thinking logging has not started
        log.started = False
        # start the new log file observer
        log.start(LOG_FILE)
        # continue with the normal spider init
        super(ExampleSpider, self).__init__(name, **kwargs)

    def parse(self, response):
        ...

And the output file might look like:

scrapy_example_2012-08-25 12:34:48.823896.log




回答3:


Scrapy uses the standard python loggers, which means you can grab and modify them as you create your spider.

import scrapy
import logging
from logging.handlers import RotatingFileHandler


Class SpiderSpider(scrapy.Spider):
    name = 'spider'
    start_urls = ['https://en.wikipedia.org/wiki/Spider']

    handler = RotatingFileHandler('spider.log', maxBytes=1024, backupCount=3)
    logging.getLogger().addHandler(handler)

    def parse(self, response):
        ...


来源:https://stackoverflow.com/questions/11942403/how-to-use-scrapy-log-module-with-custom-log-handler

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!