MapReduce

Hadoop : how to start my first project

橙三吉。 提交于 2019-12-11 04:38:54
问题 I'm starting to work with Hadoop but I don't know where and how do it. I'm working on OS X and I follow some tutorial to install Hadoop, it's done and it's work but now I don't know what to do. Is there an IDE to install (maybe eclipse)? I find some codes but nothing works and I don't know what I have to add in my project etc ... Can you give me some informations or guide me to a complete tutorial ? 回答1: If you want to learn Hadoop framework then i recomend to just start with installing

is map/reduce appropriate for finding the median and mode of a set of values for many records?

五迷三道 提交于 2019-12-11 04:35:37
问题 I have a set of objects in Mongodb that each have a set of values embedded in them, e.g.: [1.22, 12.87, 1.24, 1.24, 9.87, 1.24, 87.65] // ... up to about 150 values Is a map/reduce the best solution for finding the median (average) and mode (most common value) in the embedded arrays? The reason that I ask is that the map and the reduce both have to return the same (structurally) set of values. It looks like in my case I want to take in a set of values (the array) and return a set of two

Mongodb cannot run map reduce without the js engine

五迷三道 提交于 2019-12-11 04:17:26
问题 I deployed a nodejs app on appcloud with mongodb as service, I would like to use mapReduce for some queries but I got this error: 2016-10-21 15:45:52 [APP/0] ERR ERR! { [MongoError: cannot run map reduce without the js engine] Is it supported on swisscom appcloud or what? This is my controller (an extract): 'use strict'; const mongo = require('../mongoclient'); const paramsParser = require('../paramsParser'); const log = require('npmlog'); const faker = require('faker'); const _ = require(

Riak Map Reduce in JS returning limited data

给你一囗甜甜゛ 提交于 2019-12-11 04:07:52
问题 So I have Riak running on 2 EC2 servers, using python to run javascript Mapreduce . They have been clustered. Mainly used for "proof of concept". There are 50 keys in the bucket, all the map/reduce function does is re-format the data. This is only for testing the map/reduce functionality in Riak. Problem: The output only shows [{u'e': 2, u'undefined': 2, u'w': 2}]. That is completely wrong. The logs show that all the keys have "processed" but only 2 get returned. So my question is why is that

Configuring Hive to run in Local Mode

自古美人都是妖i 提交于 2019-12-11 04:06:30
问题 Hi I am trying to run Hive in local mode, I have set the HIVE_OPTS environment variable export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:////<myhomedir>/hivelocal/tmp -hiveconf hive.metastore.warehouse.dir=file:////<myhomedir>/hivelocal/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/<myhomedir>/hivelocal/metastore_db;create=true' and connected to hive using hive client when I create the table(name demo ), I still see the table

What is wrong with this map-reduce query on mongo?

守給你的承諾、 提交于 2019-12-11 04:05:32
问题 Please, observe the mongo shell: > map function map() { if (this.server_location[0] == -77.0367) { emit(this._id, this); } } > reduce function reduce(key, values) { return values[0]; } > db.static.mapReduce(map, reduce, {out: 'x', query: {client_location:{$near:[-75.5,41.89], $maxDistance: 1}}}) { "result" : "x", "timeMillis" : 43, "counts" : { "input" : 100, "emit" : 0, "reduce" : 0, "output" : 0 }, "ok" : 1, } > db.static.find({client_location:{$near:[-75.5,41.89], $maxDistance: 1}, $where:

How to just output value in context.write(k,v)

混江龙づ霸主 提交于 2019-12-11 03:43:37
问题 In my mapreduce job, I just want to output some lines. But if I code like this: context.write(data, null); the program will throw java.lang.NullPointerException. I don't want to code like below: context.write(data, new Text("")); because I have to trim the blank space in every line in the output files. Is there any good ways to solve it? Thanks in advance. Sorry, it's my mistake. I checked the program carefully, found the reason is I set the Reducer as combiner. If I do not use the combiner,

Hive sort operation on high volume skewed dataset

早过忘川 提交于 2019-12-11 03:36:32
问题 I am working on a big dataset of size around 3 TB on Hortonworks 2.6.5, the layout of the dataset is pretty straight forward. The heirarchy of data is as follows - -Country -Warehouse -Product -Product Type -Product Serial Id We have transaction data in the above hierarchy for 30 countries each country have more than 200 warehouse, single country USA contributes around 75% of the entire data set. Problem: 1) We have transaction data with transaction date column ( trans_dt ) for the above data

Could not deallocate container for task attemptId NNN

ぃ、小莉子 提交于 2019-12-11 03:31:02
问题 I'm trying to understand how the container allocates memory in YARN and their performance based on different hardware configuration. So, the machine has 30 GB RAM and I picked 24 GB for YARN and leave 6 GB for the system. yarn.nodemanager.resource.memory-mb=24576 Then I followed http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html to come up with some vales for Map & Reduce tasks memory. I leave these two to their default value:

MongoDB: Sort by subdocument with unknown name

☆樱花仙子☆ 提交于 2019-12-11 03:21:38
问题 I have a MongoDB collection like this: { id: "213", sales : { '2014-05-23': { sum: 23 }, '2014-05-22': { sum: 22 } } }, { id: "299", sales : { '2014-05-23': { sum: 44 }, '2014-05-22': { sum: 19 } } }, I'm looking for a query to get all documents in my collection sorted by sum (document with the largest sum on top...). For the example data it should return something like this: { id: "299", sales : { '2014-05-23': { sum: 44 }, '2014-05-22': { sum: 19 } } }, { id: "213", sales : { '2014-05-23':