Removing Airflow task logs

浪尽此生 提交于 2019-12-21 06:51:31

问题


I'm running 5 DAG's which have generated a total of about 6GB of log data in the base_log_folder over a months period. I just added a remote_base_log_folder but it seems it does not exclude logging to the base_log_folder.

Is there anyway to automatically remove old log files, rotate them or force airflow to not log on disk (base_log_folder) only in remote storage?


回答1:


Please refer https://github.com/teamclairvoyant/airflow-maintenance-dags

This plugin has DAGs that can kill halted tasks and log-cleanups. You can grab the concepts and can come up with a new DAG that can cleanup as per your requirement.




回答2:


We remove the Task logs by implementing our own FileTaskHandler, and then pointing to it in the airflow.cfg. So, we overwrite the default LogHandler to keep only N task logs, without scheduling additional DAGs.

We are using Airflow==1.10.1.

[core]
logging_config_class = log_config.LOGGING_CONFIG

log_config.LOGGING_CONFIG

BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
FOLDER_TASK_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}'
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'

LOGGING_CONFIG = {
    'formatters': {},
    'handlers': {
        '...': {},
        'task': {
            'class': 'file_task_handler.FileTaskRotationHandler',
            'formatter': 'airflow.job',
            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
            'filename_template': FILENAME_TEMPLATE,
            'folder_task_template': FOLDER_TASK_TEMPLATE,
            'retention': 20
        },
        '...': {}
    },
    'loggers': {
        'airflow.task': {
            'handlers': ['task'],
            'level': JOB_LOG_LEVEL,
            'propagate': False,
        },
        'airflow.task_runner': {
            'handlers': ['task'],
            'level': LOG_LEVEL,
            'propagate': True,
        },
        '...': {}
    }
}

file_task_handler.FileTaskRotationHandler

import os
import shutil

from airflow.utils.helpers import parse_template_string
from airflow.utils.log.file_task_handler import FileTaskHandler


class FileTaskRotationHandler(FileTaskHandler):

    def __init__(self, base_log_folder, filename_template, folder_task_template, retention):
        """
        :param base_log_folder: Base log folder to place logs.
        :param filename_template: template filename string.
        :param folder_task_template: template folder task path.
        :param retention: Number of folder logs to keep
        """
        super(FileTaskRotationHandler, self).__init__(base_log_folder, filename_template)
        self.retention = retention
        self.folder_task_template, self.folder_task_template_jinja_template = \
            parse_template_string(folder_task_template)

    @staticmethod
    def _get_directories(path='.'):
        return next(os.walk(path))[1]

    def _render_folder_task_path(self, ti):
        if self.folder_task_template_jinja_template:
            jinja_context = ti.get_template_context()
            return self.folder_task_template_jinja_template.render(**jinja_context)

        return self.folder_task_template.format(dag_id=ti.dag_id, task_id=ti.task_id)

    def _init_file(self, ti):
        relative_path = self._render_folder_task_path(ti)
        folder_task_path = os.path.join(self.local_base, relative_path)
        subfolders = self._get_directories(folder_task_path)
        to_remove = set(subfolders) - set(subfolders[-self.retention:])

        for dir_to_remove in to_remove:
            full_dir_to_remove = os.path.join(folder_task_path, dir_to_remove)
            print('Removing', full_dir_to_remove)
            shutil.rmtree(full_dir_to_remove)

        return FileTaskHandler._init_file(self, ti)



回答3:


Airflow maintainers don't think truncating logs is a part of airflow core logic, to see this, and then in this issue, maintainers suggest to change LOG_LEVEL avoid too many log data.

And in this PR, we can learn how to change log level in airflow.cfg.

good luck.




回答4:


I don't think that there is a rotation mechanism but you can store them in S3 or google cloud storage as describe here : https://airflow.incubator.apache.org/configuration.html#logs



来源:https://stackoverflow.com/questions/43548744/removing-airflow-task-logs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!