What's the difference between change input arguments and creating a new object in Vprog of spark graphx

穿精又带淫゛_ 提交于 2020-05-17 06:24:18

问题


there is my program:

static class Vprog extends AbstractFunction3< Object, OddRange, OddRange, OddRange> implements Serializable {
    @Override
    public OddRange apply(Object l, OddRange self, OddRange sumOdd) {
        System.out.println(self.getS()+self.getI()+" ---> "+sumOdd.getS()+sumOdd.getI());
        self.setS(sumOdd.getS() + self.getS());
        self.setI(self.getI() + sumOdd.getI());
        return new OddRange(self.getS(), self.getI());
     }
}

the question is if I use return new OddRange like above in class Vprog,I can change the vertexRDD

But, if I use retuen self, like:

static class Vprog extends AbstractFunction3< Object, OddRange, OddRange, OddRange> implements Serializable {
    @Override
    public OddRange apply(Object l, OddRange self, OddRange sumOdd) {
        System.out.println(self.getS()+self.getI()+" ---> "+sumOdd.getS()+sumOdd.getI());
        self.setS(sumOdd.getS() + self.getS());
        self.setI(self.getI() + sumOdd.getI());
        return self;
    }
}

The vertexRDD didn't change. I know RDD is immutable, but how can I update the vectexRDD in spark.graphx.pregel correctly?Can you give me any advise?

I have found the same question: Spark Pregel is not working with Java But I use spark 2.3.0,maybe it have the same problem?


回答1:


I think I have found the answer: We must return a new one, if we wanna change the data which will be used in next sendMsg in Vprog.
that's because Vprog changes the vertexRDD, but sendMsg uses the tripletsRDD. And what's more, the verteies in the tripletsRDD are not equels to vertexRDD, it's just a copy of vertexRDD. So,the problem is when to update the verteies in tripletsRDD when vertexRDD is changed.

We can follow the source below to find out the reason:
first part:pregel(in Pregel.scala)->joinVertices(in GraphOps.scala)->outerJoinVertices(in GraphImpl.scala)->diff(in VertexRddImpl.scala)
And then:
second part:pregel(in Pregel.scala)->mapReduceTriplets(in GraphXUtils.scala)->aggregateMessagesWithActiveSet(in GraphImpl.scala).

In first part, I found that Vprog will compare the VertexRDD data before and after execution. SO, if it is modified on the source data, they will be the same. Then a data structure named replicatedVertexView will be generated to store different VertexRDD info. If they are same, nothing will be stored.
In second part, it will update the tripletsRDD with the infomations which stored in the relicatedVertexView. And then, use the tripletsRDD in sendMsg.
So, if we don't return new in Vprog, the tripletsRDD will not be changed with VertexRDD, and the results will be wrong.



来源:https://stackoverflow.com/questions/61380532/whats-the-difference-between-change-input-arguments-and-creating-a-new-object-i

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!