MapReduce

Hadoop MapReduce implementation of shortest PATH in a graph, not just the distance

一世执手 提交于 2019-12-12 12:28:37
问题 I have been looking for "MapReduce implementation of Shortest path search algorithms". However, all the instances I could find "computed the shortest distance form node x to y", and none actually output the " actual shortest path like x-a-b-c-y ". As for what am I trying to achieve is that I have graphs with hundreds of 1000s of nodes and I need to perform frequent pattern analysis on shortest paths among the various nodes. This is for a research project I am working on. It would be a great

Create Custom InputFormat of ColumnFamilyInputFormat for cassandra

依然范特西╮ 提交于 2019-12-12 12:27:03
问题 I am working on a project, using cassandra 1.2, hadoop 1.2 I have created my normal cassandra mapper and reducer, but I want to create my own Input format class, which will read the records from cassandra, and I'll get the desired column's value, by splitting that value using splitting and indexing , so, I planned to create custom Format class. but I'm confused and not able to know, how would I make it? What classes are to be extend and implement, and how I will able to fetch the row key,

NoSuchMethodError when running on Hadoop but not when run locally

瘦欲@ 提交于 2019-12-12 12:13:36
问题 While running program on Hadoop 2.0.0-cdh4.3.1 MapReduce gives me below error : java.lang.NoSuchMethodError:com.google.common.util.concurrent.Futures.withFallback But when I test by executing JAR : java -cp myclass It runs flawlessly. I am out of idea here as if so called Futures.withFallback is present in JAR thats why its got executed in local. Its using Guava for connecting Cassandra, full stack trace is below: attempt_201507081740_21115_m_000050_0: [FATAL] Child - Error running child :

Finding most commonly used word in a string field throughout a collection

☆樱花仙子☆ 提交于 2019-12-12 12:12:14
问题 Let's say I have a Mongo collection similar to the following: [ { "foo": "bar baz boo" }, { "foo": "bar baz" }, { "foo": "boo baz" } ] Is it possible to determine which words appear most often within the foo field (ideally with a count)? For instance, I'd love a result set of something like: [ { "baz" : 3 }, { "boo" : 2 }, { "bar" : 2 } ] 回答1: There was recently closed a JIRA issue about a $split operator to be used in the $project stage of the aggregation framework. With that in place you

Runtimeexception: java.lang.NoSuchMethodException: tfidf$Reduce.<init>()

不打扰是莪最后的温柔 提交于 2019-12-12 11:48:21
问题 how to solve this problem:tfidf is my main class why this error coming after running jar file? java.lang.RuntimeException: java.lang.NoSuchMethodException: tfidf$Reduce.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1423) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at

Hadoop, MapReduce - Multiple Input/Output Paths

左心房为你撑大大i 提交于 2019-12-12 11:03:21
问题 In my input file when making the Jar for my MapReduce Job, I am using the Hadoop-local command. I wanted to know whether there was a way of, instead of specifically specifying the path for each file in my input folder to be used in the MapReduce job, whether I could just specify and pass all the files from my input folder. This is because the contents and number of files could change due to the nature of the MapReduce job I am trying to configure and as I do not know the specific amount of

Querying Hbase efficiently

不羁的心 提交于 2019-12-12 10:58:04
问题 I'm using Java as a client for querying Hbase. My Hbase table is set up like this: ROWKEY | HOST | EVENT -----------|--------------|---------- 21_1465435 | host.hst.com | clicked 22_1463456 | hlo.wrld.com | dragged . . . . . . . . . The first thing I need to do is get a list of all ROWKEYs which have host.hst.com associated with it. I can create a scanner at Column host and for each row value with column value = host.hst.com I will add the corresponding ROWKEY to the list. Seems pretty

Reduce is called several times with the same key in mongodb map-reduce

十年热恋 提交于 2019-12-12 10:49:49
问题 I'm trying to run map reduce on mongodb in mongo shell. For some reason, in the reduce phase, I get several calls for the same key (instead of single one), so I get wrong results. I'm not an expert in this domains, so maybe I'm doing some stupid mistake. Any help appreciated. Thanks. This is my small example: I'm creating 10000 documents: var i = 0; db.docs.drop(); while (i < 10000) { db.docs.insert({text:"line " + i,index:i}); i++; } Then I'm doing map-reduce based on module 10 (so I except

Successful task generates mapreduce.counters.LimitExceededException when trying to commit

南笙酒味 提交于 2019-12-12 10:49:26
问题 I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple MapReduce applications; after running for several hours one of the applications registers as SUCCEEDED but returns the following diagnostic message: We crashed after successfully committing. Recovering. The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files

Issue iterating over custom writable component in reducer

橙三吉。 提交于 2019-12-12 10:24:53
问题 I am using a custom writable class as VALUEOUT in the map phase in my MR job where the class has two fields, A org.apache.hadoop.io.Text and org.apache.hadoop.io.MapWritable . In my reduce function I iterate through the values for each key and I perform two operations, 1. filter, 2. aggregate. In the filter, I have some rules to check if certain values in the MapWritable(with key as Text and value as IntWritable or DoubleWritable ) satisfy certain conditions and then I simply add them to an