Task data locality NO_PREF. When is it used?

我怕爱的太早我们不能终老 提交于 2021-01-28 04:12:06

问题


According to Spark doc, there are 5 levels of data locality:

  • PROCESS_LOCAL
  • NODE_LOCAL
  • NO_PREF
  • RACK_LOCAL
  • ANY

All of them are pretty clear to me apart NO_PREF (from Spark doc: "data is accessed equally quickly from anywhere and has no locality preference")

What is the case NO_PREF whould be used?


回答1:


One of the RDD characteristics is preferred locations. For example if RDD source is an HDFS file, preferred location should contain data nodes where data is physically located. But if there is no difference where data is coming from or Spark is unable to determine preferred locations, Spark creates tasks with data locality set to NO_PREF during processing such RDDs.



来源:https://stackoverflow.com/questions/36616897/task-data-locality-no-pref-when-is-it-used

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!