Counter is not working in reducer code

限于喜欢 提交于 2020-01-15 11:07:27

问题


I am working on a Big hadoop project and there is a small KPI, where I have to write only the top 10 values in reduces output. To complete this requirement, I have used a counter and break the loop when counter is equal to 11, but still reducer writes all of the values to HDFS.

This is a pretty simple java code, but I am stuck :(

For testing, I have created one stand alone class (java application) to do this and this is working there; I'm wondering why it is not working in reducer code.

Please some one help me out and suggest if I missing something.

MAP - REDUCE CODE

package comparableTest;
import java.io.IOException;
import java.nio.ByteBuffer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable.Comparator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ValueSortExp2 {
    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration(true);

        String arguments[] = new GenericOptionsParser(conf, args).getRemainingArgs();

        Job job = new Job(conf, "Test commond");
        job.setJarByClass(ValueSortExp2.class);

        // Setup MapReduce
        job.setMapperClass(MapTask2.class);
        job.setReducerClass(ReduceTask2.class);
        job.setNumReduceTasks(1);

        // Specify key / value
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Text.class);
        job.setSortComparatorClass(IntComparator2.class);
        // Input
        FileInputFormat.addInputPath(job, new Path(arguments[0]));
        job.setInputFormatClass(TextInputFormat.class);

        // Output
        FileOutputFormat.setOutputPath(job, new Path(arguments[1]));
        job.setOutputFormatClass(TextOutputFormat.class);


        int code = job.waitForCompletion(true) ? 0 : 1;
        System.exit(code);

    }

    public static class IntComparator2 extends WritableComparator {

        public IntComparator2() {
            super(IntWritable.class);
        }

        @Override
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

            Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
            Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();

            return v1.compareTo(v2) * (-1);
        }
    }

    public static class MapTask2 extends Mapper<LongWritable, Text, IntWritable, Text> {

            public void  map(LongWritable key,Text value, Context context) throws IOException, InterruptedException {

                String tokens[]= value.toString().split("\\t");

            //    int empId = Integer.parseInt(tokens[0])    ;    
                int count = Integer.parseInt(tokens[2])    ;

                context.write(new IntWritable(count), new Text(value));

            }    

        }


    public static class ReduceTask2 extends Reducer<IntWritable, Text, IntWritable, Text> {
        int cnt=0;
        public void reduce(IntWritable key, Iterable<Text> list, Context context)
                throws java.io.IOException, InterruptedException {


            for (Text value : list ) {
                cnt ++;

                if (cnt==11)
                {
                    break;    
                }

                context.write(new IntWritable(cnt), value);




            }

        }
}
}  

SIMPLE JAVA CODE WOKING FINE

package comparableTest;

import java.io.IOException;
import java.util.ArrayList;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer.Context;

public class TestData {

    //static int cnt=0;


    public static void main(String args[]) throws IOException, InterruptedException {

        ArrayList<String> list = new ArrayList<String>() {{
            add("A");
            add("B");
            add("C");
            add("D");
        }};


        reduce(list);


    }

    public static void reduce(Iterable<String> list)
            throws java.io.IOException, InterruptedException {


        int cnt=0;
        for (String value : list ) {
            cnt ++;

            if (cnt==3)
            {
                break;    
            }

            System.out.println(value);    


        }

    }
}

Sample data --Header is only more info, actual data is from 2nd line

`ID NAME COUNT (need to display top 10 desc)

1 Toy Story (1995) 2077

10 GoldenEye (1995) 888

100 City Hall (1996) 128

1000 Curdled (1996) 20

1001 Associate, The (L'Associe)(1982) 0

1002 Ed's Next Move (1996) 8

1003 Extreme Measures (1996) 121

1004 Glimmer Man, The (1996) 101

1005 D3: The Mighty Ducks (1996) 142

1006 Chamber, The (1996) 78

1007 Apple Dumpling Gang, The (1975) 232

1008 Davy Crockett, King of the Wild Frontier (1955) 97

1009 Escape to Witch Mountain (1975) 291

101 Bottle Rocket (1996) 253

1010 Love Bug, The (1969) 242

1011 Herbie Rides Again (1974) 135

1012 Old Yeller (1957) 301

1013 Parent Trap, The (1961) 258

1014 Pollyanna (1960) 136

1015 Homeward Bound: The Incredible Journey (1993) 234

1016 Shaggy Dog, The (1959) 156

1017 Swiss Family Robinson (1960) 276

1018 That Darn Cat! (1965) 123

1019 20,000 Leagues Under the Sea (1954) 575

102 Mr. Wrong (1996) 60

1020 Cool Runnings (1993) 392

1021 Angels in the Outfield (1994) 247

1022 Cinderella (1950) 577

1023 Winnie the Pooh and the Blustery Day (1968) 221

1024 Three Caballeros, The (1945) 126

1025 Sword in the Stone, The (1963) 293

1026 So Dear to My Heart (1949) 8

1027 Robin Hood: Prince of Thieves (1991) 344

1028 Mary Poppins (1964) 1011

1029 Dumbo (1941) 568

103 Unforgettable (1996) 33

1030 Pete's Dragon (1977) 323

1031 Bedknobs and Broomsticks (1971) 319`


回答1:


If you move int cnt=0; inside the reduce method (as the first statement of this method), you will get the first 10 values for each key (I guess this is what you want).

Otherwise, as it is now, your counter will keep increasing and you will skip the 11th value only (regardless of key), continuing with the 12th.

If you want to print only 10 values (regardless of key), you leave the cnt initialization where it is, and change your if condition to if (cnt > 10)... However, this is not a good practice, so you may need to reconsider your algorithm. (assuming you don't want 10 random values, how do you know which key will be processed first in a distributed environment, when you have more than 1 reducers and a hash partitioner?)



来源:https://stackoverflow.com/questions/46087100/counter-is-not-working-in-reducer-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!