How to implement a written code in aggregateMessages in Pregel API in Spark?

天大地大妈咪最大 提交于 2021-01-28 05:20:04

问题


I have implemented computing a similarity between nodes of a graph in aggregateMessages. during this, the intersection or common neighbors between two nodes is computed and sent to both of them. the message is a double number. each node receives it and sum it up to calculate the similarity sum for itself. the similarity is known as Jaccard similarity.

i have graph that's structure look like this:

(vertexID, List[neighbors ID])
(vertexID, List[neighbors ID])
(vertexID, List[neighbors ID])
...
(vertexID, List[neighbors ID])

for example:

(1, List[2,6,8,9])
(2, List[12,8,7,9])
(3, List[4,22,33,16])
...

the code written in aggregateMessages is as below:

    val nodes_similarity_sum: RDD[(VertexId, Double)] = graph.aggregateMessages[Double](
      sendMsg = triplet => {
        val srcNeighbor = triplet.srcAttr
        val dstNeighbor = triplet.dstAttr

        val temp_intersect = srcNeighbor.intersect(dstNeighbor).length
        val temp_union = srcNeighbor.union(dstNeighbor).length
        val similarity =(((temp_intersect.toFloat) / (temp_union.toFloat)))

        triplet.sendToDst(similarity)
        triplet.sendToSrc(similarity)
      },
      mergeMsg = (x, y) => x + y
    )

i believe that if i implement this in Pregel, it will be more optimized and more faster. but i have trouble in implementation.

can any one implement it in Pregel? it would be so helpful and challenging!!!

来源:https://stackoverflow.com/questions/61128932/how-to-implement-a-written-code-in-aggregatemessages-in-pregel-api-in-spark

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!