MapReduce | 易学教程

Hadoop : how to start my first project

阅读更多关于 Hadoop : how to start my first project

问题 I'm starting to work with Hadoop but I don't know where and how do it. I'm working on OS X and I follow some tutorial to install Hadoop, it's done and it's work but now I don't know what to do. Is there an IDE to install (maybe eclipse)? I find some codes but nothing works and I don't know what I have to add in my project etc ... Can you give me some informations or guide me to a complete tutorial ? 回答1: If you want to learn Hadoop framework then i recomend to just start with installing

is map/reduce appropriate for finding the median and mode of a set of values for many records?

阅读更多关于 is map/reduce appropriate for finding the median and mode of a set of values for many records?

问题 I have a set of objects in Mongodb that each have a set of values embedded in them, e.g.: [1.22, 12.87, 1.24, 1.24, 9.87, 1.24, 87.65] // ... up to about 150 values Is a map/reduce the best solution for finding the median (average) and mode (most common value) in the embedded arrays? The reason that I ask is that the map and the reduce both have to return the same (structurally) set of values. It looks like in my case I want to take in a set of values (the array) and return a set of two

Mongodb cannot run map reduce without the js engine

阅读更多关于 Mongodb cannot run map reduce without the js engine

问题 I deployed a nodejs app on appcloud with mongodb as service, I would like to use mapReduce for some queries but I got this error: 2016-10-21 15:45:52 [APP/0] ERR ERR! { [MongoError: cannot run map reduce without the js engine] Is it supported on swisscom appcloud or what? This is my controller (an extract): 'use strict'; const mongo = require('../mongoclient'); const paramsParser = require('../paramsParser'); const log = require('npmlog'); const faker = require('faker'); const _ = require(

Riak Map Reduce in JS returning limited data

阅读更多关于 Riak Map Reduce in JS returning limited data

问题 So I have Riak running on 2 EC2 servers, using python to run javascript Mapreduce . They have been clustered. Mainly used for "proof of concept". There are 50 keys in the bucket, all the map/reduce function does is re-format the data. This is only for testing the map/reduce functionality in Riak. Problem: The output only shows [{u'e': 2, u'undefined': 2, u'w': 2}]. That is completely wrong. The logs show that all the keys have "processed" but only 2 get returned. So my question is why is that

Configuring Hive to run in Local Mode

阅读更多关于 Configuring Hive to run in Local Mode

问题 Hi I am trying to run Hive in local mode, I have set the HIVE_OPTS environment variable export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:////<myhomedir>/hivelocal/tmp -hiveconf hive.metastore.warehouse.dir=file:////<myhomedir>/hivelocal/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/<myhomedir>/hivelocal/metastore_db;create=true' and connected to hive using hive client when I create the table(name demo ), I still see the table

What is wrong with this map-reduce query on mongo?

阅读更多关于 What is wrong with this map-reduce query on mongo?

问题 Please, observe the mongo shell: > map function map() { if (this.server_location[0] == -77.0367) { emit(this._id, this); } } > reduce function reduce(key, values) { return values[0]; } > db.static.mapReduce(map, reduce, {out: 'x', query: {client_location:{$near:[-75.5,41.89], $maxDistance: 1}}}) { "result" : "x", "timeMillis" : 43, "counts" : { "input" : 100, "emit" : 0, "reduce" : 0, "output" : 0 }, "ok" : 1, } > db.static.find({client_location:{$near:[-75.5,41.89], $maxDistance: 1}, $where:

How to just output value in context.write(k,v)

阅读更多关于 How to just output value in context.write(k,v)

问题 In my mapreduce job, I just want to output some lines. But if I code like this: context.write(data, null); the program will throw java.lang.NullPointerException. I don't want to code like below: context.write(data, new Text("")); because I have to trim the blank space in every line in the output files. Is there any good ways to solve it? Thanks in advance. Sorry, it's my mistake. I checked the program carefully, found the reason is I set the Reducer as combiner. If I do not use the combiner,

Hive sort operation on high volume skewed dataset

阅读更多关于 Hive sort operation on high volume skewed dataset

问题 I am working on a big dataset of size around 3 TB on Hortonworks 2.6.5, the layout of the dataset is pretty straight forward. The heirarchy of data is as follows - -Country -Warehouse -Product -Product Type -Product Serial Id We have transaction data in the above hierarchy for 30 countries each country have more than 200 warehouse, single country USA contributes around 75% of the entire data set. Problem: 1) We have transaction data with transaction date column ( trans_dt ) for the above data

Could not deallocate container for task attemptId NNN

阅读更多关于 Could not deallocate container for task attemptId NNN

问题 I'm trying to understand how the container allocates memory in YARN and their performance based on different hardware configuration. So, the machine has 30 GB RAM and I picked 24 GB for YARN and leave 6 GB for the system. yarn.nodemanager.resource.memory-mb=24576 Then I followed http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html to come up with some vales for Map & Reduce tasks memory. I leave these two to their default value:

MongoDB: Sort by subdocument with unknown name

阅读更多关于 MongoDB: Sort by subdocument with unknown name

问题 I have a MongoDB collection like this: { id: "213", sales : { '2014-05-23': { sum: 23 }, '2014-05-22': { sum: 22 } } }, { id: "299", sales : { '2014-05-23': { sum: 44 }, '2014-05-22': { sum: 19 } } }, I'm looking for a query to get all documents in my collection sorted by sum (document with the largest sum on top...). For the example data it should return something like this: { id: "299", sales : { '2014-05-23': { sum: 44 }, '2014-05-22': { sum: 19 } } }, { id: "213", sales : { '2014-05-23':