How do I iterate RDD's in apache spark (scala)

后端 未结 5 1631
遇见更好的自我
遇见更好的自我 2020-12-01 01:52

I use the following command to fill an RDD with a bunch of arrays containing 2 strings [\"filename\", \"content\"].

Now I want to iterate over every of those occurre

5条回答
  •  攒了一身酷
    2020-12-01 01:59

    I would try making use of a partition mapping function. The code below shows how an entire RDD dataset can be processed in a loop so that each input goes through the very same function. I am afraid I have no knowledge about Scala, so everything I have to offer is java code. However, it should not be very difficult to translate it into scala.

    JavaRDD res = file.mapPartitions(new FlatMapFunction  ,String>(){ 
          @Override
          public Iterable call(Iterator  t) throws Exception {  
    
              ArrayList tmpRes = new ArrayList <>();
              String[] fillData = new String[2];
    
              fillData[0] = "filename";
              fillData[1] = "content";
    
              while(t.hasNext()){
                   tmpRes.add(fillData);  
              }
    
              return Arrays.asList(tmpRes);
          }
    
    }).cache();
    

提交回复
热议问题