问题
I am trying to read(open) and write files in hdfs inside a python script. But having error. Can someone tell me what is wrong here.
Code (full): sample.py
#!/usr/bin/python
from subprocess import Popen, PIPE
print "Before Loop"
cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
stdout=PIPE)
print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
stdin=PIPE)
print "After Loop 2"
for line in cat.stdout:
line += "Blah"
print line
print "Inside Loop"
put.stdin.write(line)
cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()
When I execute :
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead
It executes properly I couldn't find the file which supposed to create in hdfs modifiedfile
And When I execute :
hadoop fs -getmerge ./fileRead/ file.txt
Inside the file.txt, I got :
Before Loop
Before Loop
After Loop 1
After Loop 1
After Loop 2
After Loop 2
Can someone please tell me what I am doing wrong here ?? I dont think it reads from the sample.txt
回答1:
Try to change your put
sub process to take the cat
stdout on its own by changing this
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
stdin=PIPE)
into this
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
stdin=cat.stdout)
Full script:
#!/usr/bin/python
from subprocess import Popen, PIPE
print "Before Loop"
cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
stdout=PIPE)
print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
stdin=cat.stdout)
put.communicate()
回答2:
Can someone please tell me what I am doing wrong here ??
Your sample.py
might not be a proper mapper. A mapper probably accepts its input on stdin and writes the result to its stdout e.g., blah.py
:
#!/usr/bin/env python
import sys
for line in sys.stdin: # print("Blah\n".join(sys.stdin) + "Blah\n")
line += "Blah"
print(line)
Usage:
$ hadoop ... -file ./blah.py -mapper './blah.py' -input sample.txt -output fileRead
来源:https://stackoverflow.com/questions/28139406/reading-writing-files-from-hdfs-using-python-with-subprocess-pipe-popen-give