MapReduce | 易学教程

Convert Sequence file and get key, value pairs via map and reduce tasks in hadoop

阅读更多关于 Convert Sequence file and get key, value pairs via map and reduce tasks in hadoop

问题 I want to get all key values pairs from a sequencial files via hadoop map reduce application. I followed following post http://lintool.github.com/Cloud9/docs/content/staging-records.html for reading sequencial file in the main class but that dint work. i want to print all keysvalue pairs to normal text file in hdfs system, how can i achive that ? i wrote my code as bellow. import java.io.File; import java.io.IOException; import java.util.*; import java.util.logging.Level; import java.util

can HBase , MapReduce and HDFS can work on a single machine having Hadoop installed and running on it?

阅读更多关于 can HBase , MapReduce and HDFS can work on a single machine having Hadoop installed and running on it?

问题 I am working on a search engine design, which is to be run on cloud. We have just started, and have not much idea about Hdoop. Can anyone tell if HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ? 回答1: Yes you can. You can even create a Virtual Machine and run it on there on a single "computer" (which is what I have :) ). The key is to simply install Hadoop in "Pseudo Distributed Mode" which is even described in the Hadoop Quickstart. If you use

Hadoop wordcount unable to run - need help on decoding the hadoop error message

阅读更多关于 Hadoop wordcount unable to run - need help on decoding the hadoop error message

问题 I need some help on figuring out why my job failed. I built a single node cluster just to try it out. I followed the example here. Everything seems to be working correctly. I formatted the namenode and am able to connect to the jobtracker, datanode, and namenode via the web interface. I am able to start and stop all the hadoop services. However, when I try to run the wordcount example, I get this: Error initializing attempt_201105161023_0002_m_000011_0: java.io.IOException: Exception reading

Mongo MapReduce select latest date

阅读更多关于 Mongo MapReduce select latest date

问题 I can't seem to get my MapReduce reduce function to work properly. Here is my map function: function Map() { day = Date.UTC(this.TimeStamp.getFullYear(), this.TimeStamp.getMonth(),this.TimeStamp.getDate()); emit( { search_dt: new Date(day), user_id: this.UserId }, { timestamp: this.TimeStamp } ); } And here is my reduce function: function Reduce(key, values) { var result = [timestamp:0]; values.forEach(function(value){ if (!value.timestamp) continue; if (result.timestamp < value.timestamp)

mapreduce count example

阅读更多关于 mapreduce count example

问题 My question is about mapreduce programming in java . Suppose I have the WordCount.java example, a standard mapreduce program . I want the map function to collect some information, and return to the reduce function maps formed like: <slaveNode_id,some_info_collected> , so that I can know what slave node collected what data .. Any idea how?? public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static

Hadoop performance

阅读更多关于 Hadoop performance

问题 I installed hadoop 1.0.0 and tried out word counting example (single node cluster). It took 2m 48secs to complete. Then I tried standard linux word count program, which run in 10 milliseconds on the same set (180 kB data). Am I doing something wrong, or is Hadoop very very slow? time hadoop jar /usr/share/hadoop/hadoop*examples*.jar wordcount someinput someoutput 12/01/29 23:04:41 INFO input.FileInputFormat: Total input paths to process : 30 12/01/29 23:04:41 INFO mapred.JobClient: Running

Select distinct more than one field using MongoDB's map reduce

阅读更多关于 Select distinct more than one field using MongoDB's map reduce

问题 I want to execute this SQL statement on MongoDB: SELECT DISTINCT book,author from library So far MongoDB's DISTINCT only supports one field at a time. For more than one field, we have to use GROUP command or map-reduce. I have googled a way to use GROUP command: db.library.group({ key: {book:1, author:1}, reduce: function(obj, prev) { if (!obj.hasOwnProperty("key")) { prev.book = obj.book; prev.author = obj.author; }}, initial: { } }); However this approach only supports up to 10,000 keys.

Hadoop Map Reduce For Google web graph

阅读更多关于 Hadoop Map Reduce For Google web graph

问题 we have been given as an assignment the task of creating map reduce functions that will output for each node n in the google web graph list the nodes that you can go from node n in 3 hops. (The actual data can be found here: http://snap.stanford.edu/data/web-Google.html) Here's an example of how the items in the list will be : 1 2 1 3 2 4 3 4 3 5 4 1 4 5 4 6 5 6 From the above an example graph will be this In the above simplified example the paths for example of node 1 are α [1 -> 2 -> 4 -> 1

Hadoop on windows server

阅读更多关于 Hadoop on windows server

问题 I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice on windows? I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#? What do you use for submitting and monitoring

Map-Reduce to combine data (MongoDb)

阅读更多关于 Map-Reduce to combine data (MongoDb)

问题 I have two collections. LogData [{ "SId": 10, "NoOfDaya" : 9, "Status" : 4 } { "SId": 11, "NoOfDaya" : 8, "Status" : 2 }] OptData [ { "SId": 10, "CId": 12, "CreatedDate": ISO(24-10-2014) } { "SId": 10, "CId": 13, "CreatedDate": ISO(24-10-2014) }] Now using mongoDB I need to find the data in form select a.SPID,a.CreatedDate,CID=(MAX(a.CID)) from OptData a Join LogData c on a.SID=c.SID where Status>2 group by a.SPID,a.CreatedDate LogData have 600 records whereas OPTData have 90 millions records