What is best structure to choose for updaing nodes property in Spark GraphX?

丶灬走出姿态 提交于 2020-03-11 14:49:07

问题


its a while that i was searching a way to update nodes property in GraphX. i am working on a graph that consists of nodes and nodes property. for example (1,(2,true)). in this example 1 is the nodeID, 2 is node's label and true stands for when node has been visited. i have loaded graph with GraphLoader and made a distributed graph by RDDs.

The structure that i am using for every node is as below:

case class nodes_properties(label: Int, isVisited: Boolean = false)
      var work_graph = graph.mapVertices { case (node, property) => nodes_properties(node.toInt, false) }.cache()

And when i want to update a nodes property (for example its label), i use the following structure:

work_graph = work_graph.mapVertices((vid: VertexId, v: nodes_properties) => {
              if (vid == my_node) nodes_properties(newLabel,true)
              else v
            })

this structure does what i want, but as i see, its so costly in computation and just for a graph with 30000 nodes, it takes about 4 minutes while when i use MATLAB for doing the same operations, it takes about 25 seconds.

Question: Is there any good structure or any efficient and ideal method for updating property of nodes in graph during the algorithm? its really a bottleneck for me and i am not able to solve this.

i should mention that the algorithm has iterative nature and at each iteration i need to update nodes properties based on some conditions.

NOTE: i use unpersistVertices() and graph.checkpoint() but again this method that i have is so time consuming in updating nodes properties!

来源:https://stackoverflow.com/questions/60547724/what-is-best-structure-to-choose-for-updaing-nodes-property-in-spark-graphx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!