问题
I have a script that reads in text line by line, modifies the line slightly, and then outputs the line to a file. I can read the text into the file fine, the problem is that I cannot output the text. Here is my code.
cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"], stdout=subprocess.PIPE)
for line in cat.stdout:
line = line+"Blah";
subprocess.Popen(["hadoop", "fs", "-put", "/user/test/moddedfile.txt"], stdin=line)
This is the error I am getting.
AttributeError: 'str' object has no attribute 'fileno'
cat: Unable to write to output stream.
回答1:
Hard and quick way to make work your code:
import subprocess
from tempfile import NamedTemporaryFile
cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
stdout=subprocess.PIPE)
with NamedTemporaryFile() as f:
for line in cat.stdout:
f.write(line + 'Blah')
f.flush()
f.seek(0)
cat.wait()
put = subprocess.Popen(["hadoop", "fs", "-put", f.name, "/user/test/moddedfile.txt"],
stdin=f)
put.wait()
But I suggest You to look at hdfs/webhdfs python libraries.
For example pywebhdfs.
回答2:
stdin
argument doesn't accept a string. It should be PIPE
, None
or an existing file (something with valid .fileno()
or an integer file descriptor).
from subprocess import Popen, PIPE
cat = Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
stdout=PIPE, bufsize=-1)
put = Popen(["hadoop", "fs", "-put", "-", "/user/test/moddedfile.txt"],
stdin=PIPE, bufsize=-1)
for line in cat.stdout:
line += "Blah"
put.stdin.write(line)
cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()
来源:https://stackoverflow.com/questions/22349733/outputting-to-a-file-in-hdfs-using-a-subprocess