Convert csv data to graph data

狂风中的少年 提交于 2020-06-28 03:46:28

问题


I am experimenting Apache Giraph.I need to create a simple graph for my csv file residing in HDFS,which shows a relationship between 2 columns.(victim related to store name) My data size is of above 1Gb csv format.Initially tried to use neo4j using java with local file.But it is only capable of loading small data and cannot import data directly from HDFS. My data may increase.So thought of using Apache Giraph.

But how to achieve the same?

Hope apache giraph only takes input in vertext format .My data is in csv format.so Is there any tool to make my csv to graph format and supply it as input to Giraph for computations in graph.


回答1:


I had the same doubts, and while a lot of responses seem to suggest to rewrite the graph into a standard format outside of Giraph, this is not necessary.

You should check out the implementation of the standard class:

https://apache.googlesource.com/giraph/+/refs/heads/trunk/giraph-core/src/main/java/org/apache/giraph/io/formats/IntNullTextEdgeInputFormat.java

This reads a TSV file (this is the "Text" part of the class name) containing pairs of integer vertex IDs (this is the "Int" part) of the form:

1   2
2   4
3   2
4   1
...

No edge meta-data is considered, just a pair of vertexes (this is the "Null" part).

This example can be readily adapted to CSV by changing the SEPARATOR, or to consider string ids by converting IntWritable to Text (likewise for other types).

The input format is selected later as a property you pass to the framework (giving the fully qualified name of the class you wish to use to parse the input data).



来源:https://stackoverflow.com/questions/41606341/convert-csv-data-to-graph-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!