Get input file in Reducer

女生的网名这么多〃 提交于 2019-12-12 03:39:40

问题


I am trying to write a mapreduce job where I need to iterate the values twice.

So when a numerical csv file is given we need to apply this for each column.

For that we need to find the min and max values and apply it in the equation(v1).

What I did so far is

In map()
I emit the column id as key and each column as values
In Reduce()
I calculated the min and max values of each column.

After that I am stuck. Next my aim is to apply the equation

(v = [(v − minA)/(maxA − minA)]*(new maxA − new minA ) + new minA )

My new maxA and new minA is 0.1,0.0 respectively and I also have each columns max and min. Inorder to apply the eqn v1 I need to get v,ie the input file.

How to get that?

What I thought was-

From input csv file take the first row (iris dataset)

[5.3,3.6,1.6,0.3]

apply eqn for each attribute and emit the entire row(Min and Max value is known in Reducer itself). But in reducer I will only get the column values.Or else I should read my inputfile as an argument in setup() of reducer().

Is that a best practise. Any suggessions.

UPDATE

As Mark Vickery suggested I did the following.

public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException,
    InterruptedException {
System.out.println("in reducer");
double min = Integer.MAX_VALUE,max = 0;
Iterator<DoubleWritable> iterator = values.iterator();
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(iterator);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext()){
    System.out.println(lit.next());
    DoubleWritable value = lit.next();
    if (value.get()< min) { 
        min = value.get();
    }
    if (value.get() > max) {
        max = value.get();
    }
}
System.out.println(min);
System.out.println(max);

// move the list iterator back to start
while(lit.hasPrevious()){
    lit.previous();
}

System.out.println("Using ListIterator 2nd pass");
double x = 0;
while(lit.hasNext()){
    System.out.println(lit.next());

}

In 1 st pass I am able to get all the values correctly.But for 2 nd pass I am only getting the each element repeatedly.


回答1:


You could enumerate over the reducer values twice in the same reduce. The first time to calculate the Min and Max and the second time to calculate your value and then emit it.

Rough example:

public void Reduce(string key, List<string> values, Context context)
{
    var minA = Min(values);
    var maxA = Min(values);

    foreach (var v in values)
    {
        var result = [(v − minA)/(maxA − minA)]*(new maxA − new minA ) + new minA;

        context.Emit(result);
    }
}



回答2:


I found the answer. If we are trying to iterate twice in Reducer as below

    ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(it);
    System.out.println("Using ListIterator 1st pass");
    while(lit.hasNext())
        System.out.println(lit.next());

    // move the list iterator back to start
    while(lit.hasPrevious())
        lit.previous();

    System.out.println("Using ListIterator 2nd pass");
    while(lit.hasNext())
        System.out.println(lit.next());

We will only output as

Using ListIterator 1st pass
5.3
4.9
5.3
4.6
4.6
Using ListIterator 2nd pass
5.3
5.3
5.3
5.3
5.3

Inorder to get it in the right way we should loop like this:

ArrayList<DoubleWritable> cache = new ArrayList<DoubleWritable>();
 for (DoubleWritable aNum : values) {
    System.out.println("first iteration: " + aNum);
    DoubleWritable writable = new DoubleWritable();
    writable.set(aNum.get());
    cache.add(writable);
 }
 int size = cache.size();
 for (int i = 0; i < size; ++i) {
     System.out.println("second iteration: " + cache.get(i));
  }

Output

first iteration: 5.3
first iteration: 4.9
first iteration: 5.3
first iteration: 4.6
first iteration: 4.6
second iteration: 5.3
second iteration: 4.9
second iteration: 5.3
second iteration: 4.6
second iteration: 4.6


来源:https://stackoverflow.com/questions/22005722/get-input-file-in-reducer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!