hadoop-streaming | 易学教程

How to run MR job in Normal privilege

阅读更多关于 How to run MR job in Normal privilege

问题 I have installed Hadoop 2.3.0 and able to execute MR jobs successfully. But when I trying to execute MR jobs in normal privilege (without admin privilege) means job get fails with following exception. I tried "WordCount.jar" sample. 14/10/28 09:16:12 INFO mapreduce.Job: Task Id : attempt_1414467725299_0002_r_000 000_1, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.Task.getFsStatistics(Task.java:347) at org.apache.hadoop.mapred.ReduceTask

Python hadoop on windows cmd, one mapper and multiple inputs, Error: subprocess failed

阅读更多关于 Python hadoop on windows cmd, one mapper and multiple inputs, Error: subprocess failed

问题 I want to execute python file which is related to machine learning and as you know there are two files as inputs (train and test) which are important to make learning process. Also I have no reduce file. I have three doubts to run my command: Using two input files, I used -input file1 -input file2 according to Using multiple mapper inputs in one streaming job on hadoop? Turn off reduce, I used -D mapred.reduce.tasks=0 according to How to write 'map only' hadoop jobs? how to make flush my "sys

Running the Python Code on Hadoop Failed

阅读更多关于 Running the Python Code on Hadoop Failed

问题 I have tried to follow the instructions on this page: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ $bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -input /user/root/wordcountpythontxt -output /user/root/wordcountpythontxt-output -mapper /user/root/wordcountpython/mapper.py -reducer /user/root/wordcountpython/reducer.py -file /user/root/mapper.py -file /user/root/reducer.py It says File: /user/root/mapper.py does not exist, or is not

hadoop /usr/bin/env: python: No such file or directory

阅读更多关于 hadoop /usr/bin/env: python: No such file or directory

问题 I am trying to run a hadoop streaming server with following command from a shell script hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.19.2-streaming.jar -input $1 -output Twitter/Net.pegasus -mapper 'mapper.py Reverse' -reducer NONE -file mapper.py hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.19.2-streaming.jar -input $1 -output Twitter/Net.exclude -mapper 'mapper.py Reverse' -reducer reducer.py -file mapper.py -file reducer.py -file ../twitter/exclude.txt hadoop jar /usr

Running external python lib like (NLTK) with hadoop streaming

阅读更多关于 Running external python lib like (NLTK) with hadoop streaming

问题 I tried using http://blog.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/ zip -r nltkandyaml.zip nltk yaml mv ntlkandyaml.zip /path/to/where/your/mapper/will/be/nltkandyaml.mod import zipimport importer = zipimport.zipimporter('nltkandyaml.mod') yaml = importer.load_module('yaml') nltk = importer.load_module('nltk') And the error I got is: job_201406080403_3863/attempt_201406080403_3863_m_000000_0/work/./app/mapper.py", line 12, in import nltk ImportError:

Finding hostname of slave nodes in hadoop during execution of running map-reduce

阅读更多关于 Finding hostname of slave nodes in hadoop during execution of running map-reduce

问题 I want to know how to execute map reduce code on Hadoop 2.9.0 multi-node cluster? I wanna understand which node process which input. Actually, How to find every part of input data is processed by which mapper? I executed following python code on master: import sys import socket for line in sys.stdin: line = line.strip() words = line.split() for word in words: print('%s\t%s\t%s' % (word, 1, socket.gethostname())) I used socket.gethostname() to finding hostname of nodes. I expecte output of

hadoop-streaming : reduce task in pending state says “No room for reduce task.”

阅读更多关于 hadoop-streaming : reduce task in pending state says “No room for reduce task.”

问题 My map task completes successfully and I can see the application logs, but reducer stays in pending state Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 100.00% 200 0 0 200 0 0 / 40 reduce 0.00% 1 1 0 0 0 0 / 0 When I look at reduce task, I see All Task Attempts No Task Attempts found When I see the hadoop-hduser-jobtracker-master.log, I see the following : 2011-10-31 00:00:00,238 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task.

Storm UI topology not working

阅读更多关于 Storm UI topology not working

问题 We are executing a Storm topology in LocalCluster . Storm topology is executing fine and able to connect Storm UI (8090). But Storm UI is not displaying the running topology information. LocalCluster cluster = new LocalCluster(); and submitting like: bin/storm jar bin/StormTest-0.0.1-SNAPSHOT.jar com.abzooba.storm.twitter.TwitterTopologyCreator Twitter 回答1: LocalCluster does not have UI support... Thus the UI you are seeing belongs to a different Storm cluster. To be more precise:

# of failed Map Tasks exceeded allowed limit

阅读更多关于 # of failed Map Tasks exceeded allowed limit

问题 I am trying my hands on Hadoop streaming using Python. I have written simple map and reduce scripts by taking help from here map script is as follows : #!/usr/bin/env python import sys, urllib, re title_re = re.compile("<title>(.*?)</title>", re.MULTILINE | re.DOTALL | re.IGNORECASE) for line in sys.stdin: url = line.strip() match = title_re.search(urllib.urlopen(url).read()) if match : print url, "\t", match.group(1).strip() and reduce script is as follows : #!/usr/bin/env python from

hadoop Nanenode wont start

阅读更多关于 hadoop Nanenode wont start

问题 if you are visiting this link through my previous question : hadoop2.2.0 installation on linux ( NameNode not starting ) you probably know! I have been trying to run single-node mode for hadoop-2.2.0 for a long time now :D if not visit that and ull find out :) finally, after following the tutorials I can format the namenode fine , however when I start the namenode I see the following error in the logs: 2014-05-31 15:44:20,587 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang