What is the efficient way to update value inside Spark's RDD?

前端未结

关注

 3  914

别那么骄傲 2020-12-29 12:35

I\'m writing a graph-related program in Scala with Spark. The dataset have 4 million nodes and 4 million edges(you can treat this as a tree), but f

3条回答

孤城傲影 (楼主)

2020-12-29 12:54

As functional data structures, RDDs are immutable and an operation on an RDD generates a new RDD.

Immutability of the structure does not necessarily mean full replication. Persistant data structures are a common functional pattern where operations on immutable structures yield a new structure but previous versions are maintained and often reused.

GraphX (a 'module' on top of Spark) is a graph API on top of Spark that uses such concept: From the docs:

Changes to the values or structure of the graph are accomplished by producing a new graph with the desired changes. Note that substantial parts of the original graph (i.e., unaffected structure, attributes, and indicies) are reused in the new graph reducing the cost of this inherently functional data-structure.

It might be a solution for the problem at hand: http://spark.apache.org/docs/1.0.0/graphx-programming-guide.html

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...