giraph

Convert csv data to graph data

狂风中的少年 提交于 2020-06-28 03:46:28
问题 I am experimenting Apache Giraph.I need to create a simple graph for my csv file residing in HDFS,which shows a relationship between 2 columns.(victim related to store name) My data size is of above 1Gb csv format.Initially tried to use neo4j using java with local file.But it is only capable of loading small data and cannot import data directly from HDFS. My data may increase.So thought of using Apache Giraph. But how to achieve the same? Hope apache giraph only takes input in vertext format

Giraph's estimated cluster heap 4096MB ask is greater than the current available cluster heap of 0MB. Aborting Job

自作多情 提交于 2020-01-07 04:50:49
问题 I'm running Giraph using Hadoop 2.5.2 on a 5 node cluster. But when I try to run the SimpleShortestPathsComputation example, I get this error: Exception in thread "main" java.lang.IllegalStateException: Giraph's estimated cluster heap 2000MB ask is greater than the current available cluster heap of 0MB. Aborting Job. So far I've been unable to determine why Giraph thinks the cluster has a 0MB heap. I've set YARN_HEAPSIZE and HADOOP_HEAPSIZE in yarn-env.sh and hadoop-env.sh respectively, and

Apache Giraph - Cannot run in split master / worker mode since there is only 1 task at a time

人走茶凉 提交于 2020-01-02 04:07:06
问题 I ran Giraph 1.0.0 with hadoop 2.2.0 using the PageRank Benchmark example here. Suddenly I got this error result: Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time! at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:151) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225) at org.apache.giraph.benchmark.GiraphBenchmark.run

Gremlin - Giraph - GraphX ? On TitanDb

橙三吉。 提交于 2020-01-01 05:38:08
问题 I need some help to be confirm my choice... and to learn if you can give me some information. My storage database is TitanDb with Cassandra. I have a very large graph. My goal is to use Mllib on the graph latter. My first idea : use Titan with GraphX but I did not found anything or in development in progress... TinkerPop is not ready yet. So I have a look to Giraph. TinkerPop, Titan can communique with Rexster from TinkerPop. My question is : What are the benefit to use Giraph ? Gremlin seems

Giraph best's Vertex Input format, for an input file with ids of type String

£可爱£侵袭症+ 提交于 2019-12-24 14:23:24
问题 I have a multinode giraph cluster working properly in my PC. I executed the SimpleShortestPathExample from Giraph and was executed fine. This algorithm was ran with this file (tiny_graph.txt): [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]] This file has the following input format: [source_id,source_value,[[dest_id, edge_value],...]] Now, I’m trying to execute this same algorithm, in this same cluster, but with an input file

apache giraph build error

二次信任 提交于 2019-12-24 00:07:11
问题 I got following error in compiling giraph. I'm using ubuntu 16.04 with java 1.8 and maven 3.3.9. Follows detail of mvn -version command: Apache Maven 3.3.9 Maven home: /usr/share/maven Java version: 1.8.0_171, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre I cloned with following comand git clone http://git-wip-us.apache.org/repos/asf/giraph.git Hence I tryed following maven commands but I got always the same error. Could you please tell me what is my error? 1°

java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1

不问归期 提交于 2019-12-23 12:43:42
问题 i'm having some problems with custom classes in giraph. I made a VertexInput and Output format, but i always getting the following error: java.io.IOException: ensureRemaining: Only * bytes remaining, trying to read * with different values where the "*" are placed. This was tested on a Single Node Cluster. This problem happen when a vertexIterator do next(), and there aren't any more vertex left. This iterator it's invocated from a flush method, but i don't understand, basically, why the "next

Giraph ZooKeeper port problems

[亡魂溺海] 提交于 2019-12-11 20:55:25
问题 I am trying to run the SimpleShortestPathsVertex (aka SimpleShortestPathComputation) example described in the Giraph Quick Start. I am running this on a Hortonworks Sandbox instance (HDP 2.1) using VirtualBox, and I packaged giraph.jar using profile hadoop_2.0.0. When I try to run the example using hadoop jar giraph.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hue

Vertices with complex values in Apache Giraph

走远了吗. 提交于 2019-12-11 11:47:39
问题 I am trying to read some text file containing relevant vertices information into Giraph: each line is vertex_id attribute_1 attribute_2 .....attribute_n where each attribute is a string. The goal would be to create a vertex where all these attributes are part of vertex's value. Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges). Problem is: how? I

ClassNotFoundException running GiraphRunner on a modified SimpleShortestPathsVertex

♀尐吖头ヾ 提交于 2019-12-08 03:47:19
问题 I'm relatively new to Giraph and I'm trying to get my Giraph edit-compile-deploy loop working for our code. I am able to run various examples inspired by http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , but I'm stuck with a ClassNotFoundException when running my modified version of the SimpleShortestPathsVertex Giraph example. I've tried various combinations of -libjars and HADOOP_CLASSPATH, but I'm out of ideas and I'd really appreciate your help. Details