How to use RDD in other RDDs map method?

廉价感情. 提交于 2019-12-05 16:55:42

You should see RDDs as virtual collections. The RDD reference, only points to where the data is, in itself it has no data, so there's no point on using it in a closure.

You will need to use functions that combine RDDs together in order to achieve the desired functionality. Also, lookup as defined here is a very sequential process that requires all the lookup data available in the memory of each worker - this will not scale up.

To resolve all elements of the file rdd that to their value in index you should join both RDDs:

val resolvedFileRDD = file.keyBy(identity).join(index) // this will have the form of (key, (key,index of key)) 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!