I use the following command to fill an RDD with a bunch of arrays containing 2 strings [\"filename\", \"content\"].
Now I want to iterate over every of those occurre
I would try making use of a partition mapping function. The code below shows how an entire RDD dataset can be processed in a loop so that each input goes through the very same function. I am afraid I have no knowledge about Scala, so everything I have to offer is java code. However, it should not be very difficult to translate it into scala.
JavaRDD res = file.mapPartitions(new FlatMapFunction ,String>(){
@Override
public Iterable call(Iterator t) throws Exception {
ArrayList tmpRes = new ArrayList <>();
String[] fillData = new String[2];
fillData[0] = "filename";
fillData[1] = "content";
while(t.hasNext()){
tmpRes.add(fillData);
}
return Arrays.asList(tmpRes);
}
}).cache();