what is the difference between spark javardd methods collect() & collectAsync()?
I am exploring the spark 2.0 java api and have a doubt regarding collect() & collectAsync() available for javardd. Collect action is basically used to view the content of RDD, basically it is synchronous while collectAsync() is asynchronous meaning it Returns a future for retrieving all elements of this RDD. it allows to run other RDD to run in parallel. for better optimization you can utilize fair scheduler for job scheduling. collect(): It returns an array that contains all of the elements in this RDD. List<Integer> data = Arrays.asList(1, 2, 3, 4, 5); JavaRDD<Integer> rdd = sc.parallelize