MapReduce find word length frequency

柔情痞子 提交于 2019-12-23 01:18:23

问题


I am new in MapReduce and I wanted to ask if someone can give me an idea to perform word length frequency using MapReduce. I've already have the code for word count but I wanted to use word length, this is what I've got so far.

public class WordCount  {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
    }
}

}

Thanks ...


回答1:


For word length frequency, tokenizer.nextToken() shouldn't be emit as key. The length of that string actually be considered. So your code will do fine with just the following change and is sufficient :

word.set( String.valueOf( tokenizer.nextToken().length() ));  

Now if you give deep look, you will realize that Mapper output key should no longer be Text although it works. Better use an IntWritable key instead :

public static class Map extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private IntWritable wordLength = new IntWritable();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            wordLength.set(tokenizer.nextToken().length());
            context.write(wordLength, one);
        }
    }
}

Although most of the MapReduce examples use StringTokenizer, it's cleaner and advisable to use String.split method. So make the changes accordingly.



来源:https://stackoverflow.com/questions/26556972/mapreduce-find-word-length-frequency

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!