Reading / Writing Files from hdfs using python with subprocess, Pipe, Popen gives error

独自空忆成欢 提交于 2019-11-30 16:45:15

Try to change your put sub process to take the cat stdout on its own by changing this

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

into this

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)

Full script:

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
put.communicate()

Can someone please tell me what I am doing wrong here ??

Your sample.py might not be a proper mapper. A mapper probably accepts its input on stdin and writes the result to its stdout e.g., blah.py:

#!/usr/bin/env python
import sys

for line in sys.stdin: # print("Blah\n".join(sys.stdin) + "Blah\n")
    line += "Blah"
    print(line)

Usage:

$ hadoop ... -file ./blah.py -mapper './blah.py' -input sample.txt -output fileRead
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!