How does one specify the input file for a runner from Python?

て烟熏妆下的殇ゞ 提交于 2019-12-09 18:11:34

问题


I am writing an external script to run a mapreduce job via the Python mrjob module on my laptop (not on Amazon Elastic Compute Cloud or any large cluster).

I read from the mrjob documentation that I should use MRJob.make_runner() to run a mapreduce job from a separate python script as follows.

mr_job = MRYourJob(args=['-r', 'emr'])
with mr_job.make_runner() as runner:
    ...

However, how do I specify which input file to use? I want to use a file "datalines.txt" in the same directory as my mapreduce script and other python script that runs the map reduce. Furthermore, how do I specify the output?

I could not find a function in the mrjob documentation that allows me to specify these parameters.


回答1:


Getting started guide suggests that the input is read from stdin or files supplied at the command-line:

mr_job = MRYourJob(args=["datalines.txt"])


来源:https://stackoverflow.com/questions/12569261/how-does-one-specify-the-input-file-for-a-runner-from-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!