Why does foreach not bring anything to the driver program?

后端 未结 2 1851
野趣味
野趣味 2020-12-14 10:24

I wrote this program in spark shell

val array = sc.parallelize(List(1, 2, 3, 4))
array.foreach(x => println(x))

this prints some debug s

2条回答
  •  生来不讨喜
    2020-12-14 10:42

    You can use RDD.toLocalIterator() to bring the data to the driver (one RDD partition at a time):

    val array = sc.parallelize(List(1, 2, 3, 4))
    for(rec <- array.toLocalIterator) { println(rec) }
    

    See also

    • Spark: Best practice for retrieving big data from RDD to local machine
    • this blog post about toLocalIterator

提交回复
热议问题