How to run parallel instances of a Luigi Pipeline : Pid set already running

﹥>﹥吖頭↗ 提交于 2019-12-24 07:29:13

问题


I have a simple pipeline.

I want to start it once with the Id 2381, then while the first job is running I want to start a second run with the Id 231. The first run completes as expected.

The second run returns this response

Pid(s) set([10362]) already running
Process finished with exit code 0

I am starting the runs like this

run one:

luigi.run(
    cmdline_args=["--id='newId13822'", "--TaskTwo-id=2381"],
    main_task_cls=TaskTwo()
)

run two:

luigi.run(
    cmdline_args=["--id='newId1322'", "--TaskTwo-id=231"],
    main_task_cls=TaskTwo()
)

The tasks each have a unique ID as generated by luigi's task_id_str(...) method. Why does luigi think that the task is already running when the luigi.paramater, TaskTwo-id and MockTarget files are all different?

Pipeline code:

import time
import uuid
from luigi.mock import MockTarget
import luigi


class TaskOne(luigi.Task):
    run_id = luigi.Parameter()

    def output(self):
        return MockTarget("TaskOne{0}".format(self.run_id), mirror_on_stderr=True)

    def run(self):
        _out = self.output().open('w')
        time.sleep(10)
        _out.write(u"Hello World!\n")
        _out.close()


class TaskTwo(luigi.Task):
    id = luigi.Parameter(default=uuid.uuid4().__str__())

    def output(self):
        return MockTarget("TaskTwo{0}".format(self.id), mirror_on_stderr=True)

    def requires(self):
        return TaskOne(self.id)

    def run(self):
        _out = self.output().open('w')
        time.sleep(10)
        _out.write(u"Hello World!\n")
        _out.close()

回答1:


It looks like this might be because you are not connecting to a scheduler server, so it is trying to start a scheduler process twice. Are you running luigid?

I was able to get your code to run at the command line as follows. First I created a dir and dropped your code in a file called luigitest.py (minus the luigi.run() commands). I changed directory into the directory I created. Then I ran:

luigid --background --pidfile ./luigid.pid --logdir . --state-path .

Then I opened up a second terminal in the same directory. In the first one I ran:

PYTHONPATH=. luigi --module luigitest TaskOne --run-id newId13822 --TaskTwo-id 2381 --local-scheduler

In the second one I ran (about a second later):

PYTHONPATH=. luigi --module luigitest TaskOne --run-id newId13823 --TaskTwo-id 2382 --local-scheduler

These both output "Hello World!"



来源:https://stackoverflow.com/questions/45034961/how-to-run-parallel-instances-of-a-luigi-pipeline-pid-set-already-running

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!