Orientdb fastest batchimport

安稳与你 提交于 2020-01-06 14:29:09

问题


I'm trying to find the fastest way to import edges to OrientDB Graph from CSV. (My OrientDB version is 2.1.15.)

Now I have a graph with 100k Vertices and 1,5M Edges. Soon I will increase its size to 100M Vertices and 100B+ Edges and I don't want to wait till import ends for months :)

I've tried to do it with different ways:

  1. Default JSON ETL. Edges load rate is about 200-300 rows/sec. Very slow, it works about 1,5h. Tried to change "Tx" mode and other properties, it didnt make any changes in perfomance.

  2. Java Code using class BatchGraph. I tried different Buffer sizes for transactions here, best perfomance was achieved with size 10. But still it works slow for me: about 45m.

  3. Import special JSON format from console (IMPORT DATABASE command). (By the way it is not as good as previous two are for my task.) But it is very slow too - about 1h.

So, Are there any possibilities to import such Graph(1.5M Edges) in OrientDB in a short time? Perfect for me: less than 1 minute. Please, tell me, if i can improve somehow my code.

My json:

{
  "source": { "file": { "path": "/opt/orientdb/orientdb-community-2.1.15/bin/csv/1_1500k_edges.csv" } },
  "extractor": { "csv": {} },
  "transformers": [
    { "merge": { "joinFieldName": "ids", "lookup": "V.id" } },
    { "vertex": { "class": "V" } },
        { "edge": { "class": "Edges",
                "joinFieldName": "ide",
                "lookup": "V.id",
                "direction": "out",
                "edgeFields": { "val": "${input.val}" },
                "unresolvedLinkAction": "CREATE"} }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/graph",
       "dbType": "graph",
       "wal":false,
       "tx":true,
       "batchCommit":1000,
       "standardElementConstraints": false,
        "classes": [
         {"name": "V"},
         {"name": "Edges", "extends": "E"}
       ], "indexes": [
         {"class":"V", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
}

Java code:

this.graph = new OrientGraph(this.host, this.name, this.pass);
this.graph.setStandardElementConstraints(false);
this.graph.declareIntent(new OIntentMassiveInsert());
BatchGraph<OrientGraph> bgraph = new BatchGraph<OrientGraph>(this.graph, VertexIDType.NUMBER, buff);
bgraph.setVertexIdKey("id");
<parsing strings from CSV in id[0], id[1] and val - edge property>:
  Vertex[] vertices = new Vertex[2];
  for (int i=0;i<2;i++) {
    vertices[i] = bgraph.getVertex(id[i]);
    if (vertices[i]==null) vertices[i]=bgraph.addVertex(id[i]);
  }
  Edge edge = bgraph.addEdge(null, vertices[0], vertices[1], "Edges");
  edge.setProperty("val", val);

回答1:


I think the only way you have to do the import in ~1 min is to work in plocal:

 this.graph = new OrientGraph("plocal:/physical/path/to/db/dir", this.name, this.pass);

If it's a one-shot import, you can just do it from a java program, if it's a recurring operation and you need it to run on a stand-alone instance, you can define a server-side function to do that and expose it with a plugin

http://orientdb.com/docs/2.0/orientdb.wiki/Extend-Server.html



来源:https://stackoverflow.com/questions/37053190/orientdb-fastest-batchimport

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!