spark-graphx

why i cant change the property of nodes using map function in Spark?

本秂侑毒 提交于 2020-03-28 07:00:47
问题 i am working with GraphX in Spark to process a graph. i have a val common_neighbors: RDD[VertexId] that holds some vertexId. i use map function to transform it into a structure such as (node,1), which node is the ID of the vertex and 1 is its initial property. the code for transforming is below: val p =common_neighbors.map(x=>(x,1)) i have a graph that has a structure such as: (node,node_property(label,isDefined)). for example (1,(14,true)). this means node with ID=1 has label=14 and

Spark Scala GraphX: Shortest path between two vertices

岁酱吖の 提交于 2020-03-19 06:10:58
问题 I have a directed graph G in Spark GraphX (Scala). I would like to find the number of edges that should be crossed starting from a known vertex v1 to arrive in another vertex v2 . In other words, I need the shortest path from the vertex v1 to the vertex v2 calculated in number of edges (not using the edges' weights). I was looking at the GraphX documentation, but I wasn't able to find a method to do it. This is also needed in order to compute the depth of the graph if it has a tree structure.

Spark Scala GraphX: Shortest path between two vertices

▼魔方 西西 提交于 2020-03-19 06:10:25
问题 I have a directed graph G in Spark GraphX (Scala). I would like to find the number of edges that should be crossed starting from a known vertex v1 to arrive in another vertex v2 . In other words, I need the shortest path from the vertex v1 to the vertex v2 calculated in number of edges (not using the edges' weights). I was looking at the GraphX documentation, but I wasn't able to find a method to do it. This is also needed in order to compute the depth of the graph if it has a tree structure.

What is best structure to choose for updaing nodes property in Spark GraphX?

丶灬走出姿态 提交于 2020-03-11 14:49:07
问题 its a while that i was searching a way to update nodes property in GraphX. i am working on a graph that consists of nodes and nodes property. for example (1,(2,true)). in this example 1 is the nodeID, 2 is node's label and true stands for when node has been visited. i have loaded graph with GraphLoader and made a distributed graph by RDDs. The structure that i am using for every node is as below: case class nodes_properties(label: Int, isVisited: Boolean = false) var work_graph = graph

Comparing intersection between two nodes using broadcast variable and using RDD.filter in Spark GraphX

梦想的初衷 提交于 2020-03-04 05:03:11
问题 i work on graphs in GraphX. by using the below code i have made a variable to store neighbors of nodes in RDD: val all_neighbors: VertexRDD[Array[VertexId]] = graph.collectNeighborIds(EdgeDirection.Either) i used broadcast variable to broadcast neighbors to all slaves by using below code: val broadcastVar = all_neighbors.collect().toMap val nvalues = sc.broadcast(broadcastVar) i want to compute intersection between two nodes neighbors. for example intersection between node 1 and node 2

Spark Graphx: Time Cost increases stably per round in linearly style

旧巷老猫 提交于 2020-01-17 01:14:29
问题 I use graphx api in a iterative alogrithm. Although I have carefully cache/ unpersist rdd, and take care of the vertices partition num. The time cost still seems increases per round in a lineary trend. The simplified version of my code as following, and it gets the same problem: import org.apache.log4j.{Level, Logger} import org.apache.spark.graphx.Graph import org.apache.spark.graphx.util.GraphGenerators import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext}

Creating array per Executor in Spark and combine into RDD

纵饮孤独 提交于 2020-01-11 11:10:54
问题 I am moving from MPI based systems to Apache Spark. I need to do the following in Spark. Suppose, I have n vertices. I want to create an edge list from these n vertices. An edge is just a tuple of two integers (u,v), no attributes are required. However, I want to create them in parallel independently in each executor. Therefore, I want to create P edge arrays independently for P Spark Executors. Each array may be of different sizes and depends on the vertices, therefore, I also need the

Edgetriplets are not getting broadcast-ed properly

人走茶凉 提交于 2020-01-07 06:26:13
问题 I created a graph using graphx and now I need to extract sub-graphs from the original graph. In the following code I am trying to broadcast edgetriplets and filter it for each user-id. class VertexProperty(val id:Long) extends Serializable case class User(val userId:Long, var offset:Int, val userCode:String, val Name:String, val Surname:String, val organizational_unit:String, val UME:String, val person_type:String, val SOD_HIGH:String, val SOD_MEDIUM:String, val SOD_LOW:String, val Under

Gremlin - Giraph - GraphX ? On TitanDb

橙三吉。 提交于 2020-01-01 05:38:08
问题 I need some help to be confirm my choice... and to learn if you can give me some information. My storage database is TitanDb with Cassandra. I have a very large graph. My goal is to use Mllib on the graph latter. My first idea : use Titan with GraphX but I did not found anything or in development in progress... TinkerPop is not ready yet. So I have a look to Giraph. TinkerPop, Titan can communique with Rexster from TinkerPop. My question is : What are the benefit to use Giraph ? Gremlin seems

How to read data from a file as a Graph (GraphX)?

放肆的年华 提交于 2019-12-25 02:46:57
问题 I'm new to Scala, and am trying to read an undirected graph as a Graph(GraphX) from a text file. The text file is has the format: 1,8,9,10 2,5,6,7,3,1 representing that node 1 is connected to nodes 8,9 and 10(adjacency list) and node 2 is connected to nodes 5,6,7,3,1. I am trying to read this as a Graph(GraphX) I'm trying to accomplish this using the fromEdge[VD,ED] method(GraphX), where I have to pass pairs of edges. val graph = sc.textFile("Path to file").map(line=>line.split(",").map(line=