Streaming Command Failed! in RHADOOP

时光毁灭记忆、已成空白 提交于 2019-12-01 16:44:46

Your current implmentation is using Rstudio. Can you try writing the code in .R and run using the hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar -input file-in-hadoop -output hdfs_output_dir -file mapper_file -file reducer_file -mapper mapper.R -reducer reducer.R

By the way your exception PipeMapRed.waitOutputThreads(): can be caused only when there isn't proper input/output path specified. Please do check your paths.

This should work.

Your code worked fine for me on changing the HADOOP_CMD and HADOOP_STREAMING to match my system configuration (I'm running hadoop 2.4.0 on Ubuntu 14.04).

My suggestion is that:

  • Ensure that functional instance of hadoop is running i.e., the command jps on your terminal should show below output:

  • Ensure that rJava library gets loaded while you are loading library(rhdfs).
  • Ensure that you are referring to the correct streaming jar file.

Below is the R code and the output:

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

library(rhdfs)
# Loading required package: rJava
# HADOOP_CMD=/usr/local/hadoop/bin/hadoop
# Be sure to run hdfs.init()

hdfs.init()
library(rmr2)
ints = to.dfs(1:10)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))

Output:

15/04/07 05:18:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/07 05:18:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [/usr/local/hadoop/data/hadoop-unjar1328285833881826794/] [] /tmp/    streamjob6167004817219806828.jar tmpDir=null
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:48 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: number of splits:2
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428363713092_0002
15/04/07 05:18:49 INFO impl.YarnClientImpl: Submitted application application_1428363713092_0002
15/04/07 05:18:50 INFO mapreduce.Job: The url to track the job: http://manohar-dt:8088/proxy/application_1428363713092_0002/
15/04/07 05:18:50 INFO mapreduce.Job: Running job: job_1428363713092_0002
15/04/07 05:19:00 INFO mapreduce.Job: Job job_1428363713092_0002 running in uber mode : false
15/04/07 05:19:00 INFO mapreduce.Job:  map 0% reduce 0%
15/04/07 05:19:15 INFO mapreduce.Job:  map 50% reduce 0%
15/04/07 05:19:16 INFO mapreduce.Job:  map 100% reduce 0%
15/04/07 05:19:17 INFO mapreduce.Job: Job job_1428363713092_0002 completed successfully
15/04/07 05:19:17 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=194356
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=979
        HDFS: Number of bytes written=919
        HDFS: Number of read operations=14
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Job Counters 
        Launched map tasks=2
        Data-local map tasks=2
    Total time spent by all maps in occupied slots (ms)=25803
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=25803
    Total vcore-seconds taken by all map tasks=25803
    Total megabyte-seconds taken by all map tasks=26422272
    Map-Reduce Framework
    Map input records=3
    Map output records=3
    Input split bytes=186
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=293
    CPU time spent (ms)=3640
    Physical memory (bytes) snapshot=322818048
    Virtual memory (bytes) snapshot=2107604992
    Total committed heap usage (bytes)=223346688
    File Input Format Counters 
    Bytes Read=793
    File Output Format Counters 
        Bytes Written=919
15/04/07 05:19:17 INFO streaming.StreamJob: Output directory: /tmp/file11d247219866

Hope this helps.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!