In a MapReduce , how to send arraylist as value from mapper to reducer [duplicate]

问题

How can we pass an arraylist as value from the mapper to the reducer.

My code basically has certain rules to work with and would create new values(String) based on the rules.I am maintaining all the outputs(generated after the rule execution) in a list and now need to send this output(Mapper value) to the Reducer and do not have a way to do so.

Can some one please point me to a direction

Adding Code

package develop;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

import utility.RulesExtractionUtility;

public class CustomMap{


    public static class CustomerMapper extends Mapper<Object, Text, Text, Text> {
        private Map<String, String> rules;
        @Override
        public void setup(Context context)
        {

            try
            {
                URI[] cacheFiles = context.getCacheFiles();
                setupRulesMap(cacheFiles[0].toString());
            }
            catch (IOException ioe)
            {
                System.err.println("Error reading state file.");
                System.exit(1);
            }

        }

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

//          Map<String, String> rules = new LinkedHashMap<String, String>();
//          rules.put("targetcolumn[1]", "ASSIGN(source[0])");
//          rules.put("targetcolumn[2]", "INCOME(source[2]+source[3])");
//          rules.put("targetcolumn[3]", "ASSIGN(source[1]");

//          Above is the "rules", which would basically create some list values from source file

            String [] splitSource = value.toString().split(" ");

            List<String>lists=RulesExtractionUtility.rulesEngineExecutor(splitSource,rules);

//          lists would have values like (name, age) for each line from a huge text file, which is what i want to write in context and pass it to the reducer.
//          As of now i havent implemented the reducer code, as m stuck with passing the value from mapper.

//          context.write(new Text(), lists);---- I do not have a way of doing this


        }




        private void setupRulesMap(String filename) throws IOException
        {
            Map<String, String> rule = new LinkedHashMap<String, String>();
            BufferedReader reader = new BufferedReader(new FileReader(filename));
            String line = reader.readLine();
            while (line != null)
            {
                String[] split = line.split("=");
                rule.put(split[0], split[1]);
                line = reader.readLine();

                // rules logic
            }
            rules = rule;
        }
    }
    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException, URISyntaxException {


    Configuration conf = new Configuration();
    if (args.length != 2) {
        System.err.println("Usage: customerMapper <in> <out>");
        System.exit(2);
    }
    Job job = Job.getInstance(conf);
    job.setJarByClass(CustomMap.class);
    job.setMapperClass(CustomerMapper.class);
    job.addCacheFile(new URI("Some HDFS location"));


    URI[] cacheFiles= job.getCacheFiles();
    if(cacheFiles != null) {
        for (URI cacheFile : cacheFiles) {
            System.out.println("Cache file ->" + cacheFile);
        }
    }
    // job.setReducerClass(Reducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

回答1:

To pass an arraylist from mapper to reducer, it's clear that objects must implement Writable interface. Why don't you try this library?

<dependency>
    <groupId>org.apache.giraph</groupId>
    <artifactId>giraph-core</artifactId>
    <version>1.1.0-hadoop2</version>
</dependency>

It has an abstract class:

public abstract class ArrayListWritable<M extends org.apache.hadoop.io.Writable>
extends ArrayList<M>
implements org.apache.hadoop.io.Writable, org.apache.hadoop.conf.Configurable

You could create your own class and source code filling the abstract methods and implementing the interface methods with your code. For instance:

public class MyListWritable extends ArrayListWritable<Text>{
    ...
}

回答2:

A way to do that (probably not the only nor the best one), would be to

serialize your list in a string to pass it to the output value in the mapper
deserialize and rebuild your list from the string when you read the input value in the reducer

If you do so, then you should also get rid of all special symbols in the string containing the serialized list (symbols like \n or \t for instance). An easy way to achieve that is to used base64 encoded strings.

回答3:

You should send Text objects instead String objects. Then you can use object.toString() in your Reducer. Be sure to config your driver properly.

If you post your code we will help you further.

来源：https://stackoverflow.com/questions/30945769/in-a-mapreduce-how-to-send-arraylist-as-value-from-mapper-to-reducer

标签

java

Hadoop

arraylist

MapReduce