How to ignore failures on Luigi tasks triggered inside another task's run()

痞子三分冷 提交于 2021-01-27 11:43:59

问题


Consider the following tasks:

import luigi


class YieldFailTaskInBatches(luigi.Task):
    def run(self):
        for i in range(5):
            yield [
                FailTask(i, j)
                for j in range(2)
            ]


class YieldAllFailTasksAtOnce(luigi.Task):
    def run(self):
        yield [
            FailTask(i, j)
            for j in range(2)
            for i in range(5)
        ]

class FailTask(luigi.Task):
    i = luigi.IntParameter()
    j = luigi.IntParameter()

    def run(self):
        print("i: %d, j: %d" % (self.i, self.j))
        if self.j > 0:
            raise Exception("i: %d, j: %d" % (self.i, self.j))

The FailTask fails if j > 0. The YieldFailTaskInBatches yield the FailTask multiple times inside a for loop, while YieldAllFailTasksAtOnce yields all tasks in an array.

If I run YieldFailTaskInBatches, Luigi runs the tasks yielded in the first loop and, as one of them fails (i = 0, j = 1), Luigi doesn't yield the rest. If I run YieldAllFailTasksAtOnce, Luigi runs all the tasks as expected.

My question is: how can I tell Luigi to keep running the remaining tasks on YieldFailTasksInBatches, even if some of the tasks failed? Is it possible at all?

The reason I"m asking is that I have around ~400k tasks to be triggered. I don't want to trigger them all at once, as that'll make Luigi spend too much time building each task's requirements (they can have between 1 and 400 requirements). My current solution is to yield them in batches, few at a time, but then if any of these fail, the task stops and the remaining aren't yielded.

It seems that this issue could solve this problem if implemented, but I'm wondering if there's some other way.


回答1:


This is very hackish, but it should do what you want:

class YieldAll(luigi.Task):
    def run(self):
        errors = list()
        for i in range(5):
            for j in range(2):
                try:
                    FailTask(i, j).run()
                except Exception as e:
                    errors.append(e)

        if errors:
            raise ValueError(f' all traceback: {errors}')

class FailTask(luigi.Task):
    i = luigi.IntParameter()
    j = luigi.IntParameter()

    def run(self):
        print("i: %d, j: %d" % (self.i, self.j))
        if self.j > 0:
            raise Exception("i: %d, j: %d" % (self.i, self.j))

so basically you are running task outside of the luigi context. unless you output a target, luigi will never know if the task has run or not.

the only task luigi is aware is YieldAll. If any of the YieldAll creates an error, the code will catch it and set the YieldAll task with a fail status.



来源:https://stackoverflow.com/questions/53523339/how-to-ignore-failures-on-luigi-tasks-triggered-inside-another-tasks-run

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!