Java MapReduce counting by date

后端未结

关注

 2  887

萌比男神i 2021-01-28 09:16

I\'m new to Hadoop, and i\'m trying to do a MapReduce program, to count the max first two occurrencise of lecters by date (grouped by month). So my input is of this kind :

2条回答

萌比男神i (楼主)

2021-01-28 10:15

I think you're trying to do too much work in the Mapper. You only need to group the dates (which it seems you aren't formatting them correctly anyway based on your expected output).

The following approach is going to turn these lines, for example

2017-07-01 , A, B, A, C, B, E, F
2017-07-05 , A, B, A, G, B, G, G

Into this pair for the reducer

2017-07 , ("A,B,A,C,B,E,F", "A,B,A,G,B,G,G")

In other words, you gain no real benefit by using an ArrayWritable, just keep it as text.

So, the Mapper would look like this

class CustomMap extends Mapper {

    private final Text key = new Text();
    private final Text output = new Text();

    @Override
    protected void map(LongWritable offset, Text value, Context context) throws IOException, InterruptedException {

        int separatorIndex = value.find(",");

        final String valueStr = value.toString();
        if (separatorIndex < 0) {
            System.err.printf("mapper: not enough records for %s", valueStr);
            return;
        }
        String dateKey = valueStr.substring(0, separatorIndex).trim();
        String tokens = valueStr.substring(1 + separatorIndex).trim().replaceAll("\\p{Space}", "");

        SimpleDateFormat fmtFrom = new SimpleDateFormat("yyyy-MM-dd");
        SimpleDateFormat fmtTo = new SimpleDateFormat("yyyy-MM");

        try {
            dateKey = fmtTo.format(fmtFrom.parse(dateKey));
            key.set(dateKey);
        } catch (ParseException ex) {
            System.err.printf("mapper: invalid key format %s", dateKey);
            return;
        }

        output.set(tokens);
        context.write(key, output);
    }
}

And then the reducer can build a Map that collects and counts the values from the value strings. Again, writing out only Text.

class CustomReduce extends Reducer {

    private final Text output = new Text();

    @Override
    protected void reduce(Text date, Iterable values, Context context) throws IOException, InterruptedException {

        Map keyMap = new TreeMap<>();
        for (Text v : values) {
            String[] keys = v.toString().trim().split(",");

            for (String key : keys) {
                if (!keyMap.containsKey(key)) {
                    keyMap.put(key, 0);
                }
                keyMap.put(key, 1 + keyMap.get(key));
            }
        }

        output.set(mapToString(keyMap));
        context.write(date, output);
    }

    private String mapToString(Map map) {
        StringBuilder sb = new StringBuilder();
        String delimiter = ", ";
        for (Map.Entry entry : map.entrySet()) {
            sb.append(
                    String.format("%s:%d", entry.getKey(), entry.getValue())
            ).append(delimiter);
        }
        sb.setLength(sb.length()-delimiter.length());
        return sb.toString();
    }
}

Given your input, I got this

2017-06 A:4, B:4, C:1, E:4, F:3, K:1, Q:2, R:1, T:1
2017-07 A:4, B:4, C:1, E:1, F:1, G:3

0 讨论(0)

查看其它2个回答