问题
I'm trying to run very simple task with mapreduce.
mapper.py:
#!/usr/bin/env python
import sys
for line in sys.stdin:
print line
my txt file:
qwerty
asdfgh
zxc
Command line to run the job:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper /home/cloudera/Documents/map.py \
-file /home/cloudera/Documents/map.py
Error:
INFO mapreduce.Job: Task Id : attempt_1490617885665_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
How to fix this and run the code?
When I use cat /home/cloudera/Documents/test.txt | python /home/cloudera/Documents/map.py
it works fine
!!!!!UPDATE
Something wrong with my *.py file. I have copied file from github 'tom white hadoop book' and everything is working fine.
But I cant understand what is the reason. It is not the permissions and charset (if I am not wrong). What else can it be?
回答1:
I faced the same problem.
Issue: When the python file is created in Windows environment the new line character is CRLF. My hadoop runs on Linux which understands the newline character as LF
Solution: After changing the CRLF to LF the step ran successfully.
回答2:
In -mapper
argument you should set command, for running on cluster nodes. So there are no /home/cloudera/Documents/map.py file there.
Files that you pass with -files
option are placed in working directory, so you can simply use it in this way: ./map.py
I don't remember what permissions are set to this file, so if there are no execute permissions use it as python map.py
so the full command is
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \
-input /user/cloudera/In/test.txt \
-output /user/cloudera/test \
-mapper "python map.py" \
-file /home/cloudera/Documents/map.py
回答3:
You have an error in your mapper.py or reducer.py.for example:
- Not using
#!/usr/bin/env python
on top of files. - Syntax or logical error in your python codes. (for example print has different syntax in python2 and python3.)
回答4:
First Check python --version
. If output of python --version
is
Command 'python' not found, but can be installed with:
sudo apt install python3
sudo apt install python
sudo apt install python-minimal
You also have python3 installed, you can run 'python3' instead.
Install python by using sudo apt install python
and run your hadoop job
On my PC it worked and finally it's working
来源:https://stackoverflow.com/questions/43048654/hadoop-python-subprocess-failed-with-code-127