问题
I have implemented computing a similarity between nodes of a graph in aggregateMessages. during this, the intersection or common neighbors between two nodes is computed and sent to both of them. the message is a double number. each node receives it and sum it up to calculate the similarity sum for itself. the similarity is known as Jaccard similarity.
i have graph that's structure look like this:
(vertexID, List[neighbors ID])
(vertexID, List[neighbors ID])
(vertexID, List[neighbors ID])
...
(vertexID, List[neighbors ID])
for example:
(1, List[2,6,8,9])
(2, List[12,8,7,9])
(3, List[4,22,33,16])
...
the code written in aggregateMessages is as below:
val nodes_similarity_sum: RDD[(VertexId, Double)] = graph.aggregateMessages[Double](
sendMsg = triplet => {
val srcNeighbor = triplet.srcAttr
val dstNeighbor = triplet.dstAttr
val temp_intersect = srcNeighbor.intersect(dstNeighbor).length
val temp_union = srcNeighbor.union(dstNeighbor).length
val similarity =(((temp_intersect.toFloat) / (temp_union.toFloat)))
triplet.sendToDst(similarity)
triplet.sendToSrc(similarity)
},
mergeMsg = (x, y) => x + y
)
i believe that if i implement this in Pregel, it will be more optimized and more faster. but i have trouble in implementation.
can any one implement it in Pregel? it would be so helpful and challenging!!!
来源:https://stackoverflow.com/questions/61128932/how-to-implement-a-written-code-in-aggregatemessages-in-pregel-api-in-spark