Luigi: how to pass arguments to dependencies using luigi.build interface?

落爺英雄遲暮 提交于 2020-12-13 04:52:50

问题


Consider a situation where a task depends on another through a dynamic dependency:

import luigi
from luigi import Task, TaskParameter, IntParameter

class TaskA(Task):
    parent = TaskParameter()
    arg = IntParameter(default=0)
    def requires(self):
        return self.parent()
    def run(self):
        print(f"task A arg = {self.arg}")

class TaskB(Task):
    arg = IntParameter(default=0)
    def run(self):
        print(f"task B arg = {self.arg}")

if __name__ == "__main__":
    luigi.run(["TaskA", "--parent" , "TaskB", "--arg", "1", "--TaskB-arg", "2"])

(Notice the default arg=0 Parameter).

Using the luigi.run() interface, this works. As you can see, TaskA is given two arguments: parent=TaskB and arg=1. Furthermore TaskB is also given argument arg=2 by using the syntax --TaskB-arg.


Scheduled 2 tasks of which:
* 1 ran successfully:
    - 1 TaskB(arg=2)
* 1 failed:
    - 1 TaskA(parent=TaskB, arg=1)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

(In this example tasks failed because TaskB is not writing its output to a file that TaskA can read. But that's just to keep the example short. The important point is that both TaskA and TaskB are passed the correct arg).

My problem now is: how do I do the exact same thing, but using the luigi.build() interface? There's two reasons why I want to do this: First is that the source code says that luigi.run() shouldn't be used. But second, I can't run more than one luigi.run() per process, but I can do so with luigi.build(). This is important because I want to do something like:

if __name__ == "__main__":
    for i in range(3):
        luigi.run(["TaskA", "--parent" , "TaskB", "--arg", f"{i}", "--TaskB-arg", f"{i}"])

However if you try this you get the error:

Pid(s) {10084} already running

So, in the luigi.build() interface you're supposed to pass it a list of the tasks instantiated with their parameters:

if __name__ == "__main__":
    for i in range(3):
        luigi.build([TaskA(parent=TaskB, arg=i)])

This does what's expected with regards to TaskA, but TaskB takes the default arg=0.

So question: how to pass arguments to dependencies using luigi.build() interface?

Here's things that I've tried and don't work:

A)

if __name__ == "__main__":
    for i in range(3):
        luigi.build([TaskA(parent=TaskB, arg=i), TaskB(arg=i)])

Doesn't work because two instances of TaskB are ran: one with the default (wrong) arg, which TaskA depends on, and one with the correct arg, which TaskA doesn't depend on.

B)

if __name__ == "__main__":
    for i in range(3):
        luigi.build([TaskA(parent=TaskB(arg=i), arg=i)])

TypeError: 'TaskB' object is not callable

C)

if __name__ == "__main__":
    for i in range(3):
        luigi.build([TaskA(parent=TaskB, arg=i)], "--TaskB-arg", f"{i}")

Getting desperate. I tried something like the old interface, but doesn't work:

AttributeError: 'str' object has no attribute 'create_remote_scheduler'


回答1:


I believe that your problem is that you are passing the parent as a class and not a Task object. Try to pass it like this:

luigi.build([TaskA(parent=TaskB(arg=i), ...)])

Edit: You may then need to modify TaskA because you have

def requires(self):
   return self.parent()

which constructs the parent as a TaskB object with default params.

Edit2: This design model is actually not encouraged. If you are running with multiple workers, then this will not pickle-depickle correctly. I would recommend creating a new ParameterizedTaskParameter (or some better name) that pickles a task instance and stores it as an object parameter does.



来源:https://stackoverflow.com/questions/64837259/luigi-how-to-pass-arguments-to-dependencies-using-luigi-build-interface

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!