How to get element by Index in Spark RDD (Java)

前端 未结 3 1747
失恋的感觉
失恋的感觉 2020-12-01 06:52

I know the method rdd.firstwfirst() which gives me the first element in an RDD.

Also there is the method rdd.take(num) Which gives me the first "num" elemen

3条回答
  •  不知归路
    2020-12-01 07:16

    I got stuck on this for a while as well, so to expand on Maasg's answer but answering to look for a range of values by index for Java (you'll need to define the 4 variables at the top):

    DataFrame df;
    SQLContext sqlContext;
    Long start;
    Long end;
    
    JavaPairRDD indexedRDD = df.toJavaRDD().zipWithIndex();
    JavaRDD filteredRDD = indexedRDD.filter((Tuple2 v1) -> v1._2 >= start && v1._2 < end);
    DataFrame filteredDataFrame = sqlContext.createDataFrame(filteredRDD, df.schema());
    

    Remember that when you run this code your cluster will need to have Java 8 (as a lambda expression is in use).

    Also, zipWithIndex is probably expensive!

提交回复
热议问题