spark-graphx | 易学教程

why i cant change the property of nodes using map function in Spark?

阅读更多关于 why i cant change the property of nodes using map function in Spark?

问题 i am working with GraphX in Spark to process a graph. i have a val common_neighbors: RDD[VertexId] that holds some vertexId. i use map function to transform it into a structure such as (node,1), which node is the ID of the vertex and 1 is its initial property. the code for transforming is below: val p =common_neighbors.map(x=>(x,1)) i have a graph that has a structure such as: (node,node_property(label,isDefined)). for example (1,(14,true)). this means node with ID=1 has label=14 and

Spark Scala GraphX: Shortest path between two vertices

阅读更多关于 Spark Scala GraphX: Shortest path between two vertices

问题 I have a directed graph G in Spark GraphX (Scala). I would like to find the number of edges that should be crossed starting from a known vertex v1 to arrive in another vertex v2 . In other words, I need the shortest path from the vertex v1 to the vertex v2 calculated in number of edges (not using the edges' weights). I was looking at the GraphX documentation, but I wasn't able to find a method to do it. This is also needed in order to compute the depth of the graph if it has a tree structure.

Spark Scala GraphX: Shortest path between two vertices

阅读更多关于 Spark Scala GraphX: Shortest path between two vertices

What is best structure to choose for updaing nodes property in Spark GraphX?

阅读更多关于 What is best structure to choose for updaing nodes property in Spark GraphX?

问题 its a while that i was searching a way to update nodes property in GraphX. i am working on a graph that consists of nodes and nodes property. for example (1,(2,true)). in this example 1 is the nodeID, 2 is node's label and true stands for when node has been visited. i have loaded graph with GraphLoader and made a distributed graph by RDDs. The structure that i am using for every node is as below: case class nodes_properties(label: Int, isVisited: Boolean = false) var work_graph = graph

Comparing intersection between two nodes using broadcast variable and using RDD.filter in Spark GraphX

阅读更多关于 Comparing intersection between two nodes using broadcast variable and using RDD.filter in Spark GraphX

问题 i work on graphs in GraphX. by using the below code i have made a variable to store neighbors of nodes in RDD: val all_neighbors: VertexRDD[Array[VertexId]] = graph.collectNeighborIds(EdgeDirection.Either) i used broadcast variable to broadcast neighbors to all slaves by using below code: val broadcastVar = all_neighbors.collect().toMap val nvalues = sc.broadcast(broadcastVar) i want to compute intersection between two nodes neighbors. for example intersection between node 1 and node 2

Spark Graphx: Time Cost increases stably per round in linearly style

阅读更多关于 Spark Graphx: Time Cost increases stably per round in linearly style

问题 I use graphx api in a iterative alogrithm. Although I have carefully cache/ unpersist rdd, and take care of the vertices partition num. The time cost still seems increases per round in a lineary trend. The simplified version of my code as following, and it gets the same problem: import org.apache.log4j.{Level, Logger} import org.apache.spark.graphx.Graph import org.apache.spark.graphx.util.GraphGenerators import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext}

Creating array per Executor in Spark and combine into RDD

阅读更多关于 Creating array per Executor in Spark and combine into RDD

问题 I am moving from MPI based systems to Apache Spark. I need to do the following in Spark. Suppose, I have n vertices. I want to create an edge list from these n vertices. An edge is just a tuple of two integers (u,v), no attributes are required. However, I want to create them in parallel independently in each executor. Therefore, I want to create P edge arrays independently for P Spark Executors. Each array may be of different sizes and depends on the vertices, therefore, I also need the

Edgetriplets are not getting broadcast-ed properly

阅读更多关于 Edgetriplets are not getting broadcast-ed properly

问题 I created a graph using graphx and now I need to extract sub-graphs from the original graph. In the following code I am trying to broadcast edgetriplets and filter it for each user-id. class VertexProperty(val id:Long) extends Serializable case class User(val userId:Long, var offset:Int, val userCode:String, val Name:String, val Surname:String, val organizational_unit:String, val UME:String, val person_type:String, val SOD_HIGH:String, val SOD_MEDIUM:String, val SOD_LOW:String, val Under

Gremlin - Giraph - GraphX ? On TitanDb

阅读更多关于 Gremlin - Giraph - GraphX ? On TitanDb

问题 I need some help to be confirm my choice... and to learn if you can give me some information. My storage database is TitanDb with Cassandra. I have a very large graph. My goal is to use Mllib on the graph latter. My first idea : use Titan with GraphX but I did not found anything or in development in progress... TinkerPop is not ready yet. So I have a look to Giraph. TinkerPop, Titan can communique with Rexster from TinkerPop. My question is : What are the benefit to use Giraph ? Gremlin seems

How to read data from a file as a Graph (GraphX)?

阅读更多关于 How to read data from a file as a Graph (GraphX)?

问题 I'm new to Scala, and am trying to read an undirected graph as a Graph(GraphX) from a text file. The text file is has the format: 1,8,9,10 2,5,6,7,3,1 representing that node 1 is connected to nodes 8,9 and 10(adjacency list) and node 2 is connected to nodes 5,6,7,3,1. I am trying to read this as a Graph(GraphX) I'm trying to accomplish this using the fromEdge[VD,ED] method(GraphX), where I have to pass pairs of edges. val graph = sc.textFile("Path to file").map(line=>line.split(",").map(line=