Extracting rows containing specific value using mapReduce and hadoop

别来无恙 提交于 2019-12-02 09:37:43

问题


I'm new to developing map-reduce function. Consider I have csv file containing four column data.

For example:

101,87,65,67  
102,43,45,40  
103,23,56,34  
104,65,55,40  
105,87,96,40  

Now, I want extract say

40 102  
40 104  
40 105  

as those row contain 40 in forth column.

How to write map reduce function?


回答1:


Basically WordCount example resembles very well what you are trying to achieve. Instead of initializing the count per each word, you should have a condition to check if the tokenized String has required value and only in that case you write to context. This will work, since Mapper will receive each line of the CSV separately.

Now Reducer will receive the list of the values, already organized per key. In Reducer, instead of having IntWritable as output value type, you can use NullWritable for return value type, so your code will only output the keys. Also you do not need the cycle in Reducer, since you only would like to output the keys.

I do not provide you any code in my answer, since you will learn nothing from that. Make you way from the recommendations.

EDIT: since you modified you question with request for Reducer, here are some tips how you can achieve what you want.

One of the possibilities for achiving desired result is: in Mapper, after splitting (or tekenizing) the line, you write to context column 3 as key and column 0 as value. Your Reducer, since you do not need to any kind of aggregation, can simply write the keys and values produced by Mappers (yep, your Reducer code will end up with a single line of code). You can check one of my previous answers, the figure there explains quite well what Map and Reduce phases are doing.



来源:https://stackoverflow.com/questions/37004413/extracting-rows-containing-specific-value-using-mapreduce-and-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!