OrientDB ETL loading CSV with vertices in one file and edges in another

我怕爱的太早我们不能终老 提交于 2019-11-30 18:00:57

问题


I have some data that is in 2 CSV files, one contains the vertices and the other file contains the edges are in the other file. I'm working out how to set this up using ETL and am close but not quite there yet--it mostly works but my edges have properties and I'm not sure that they're loading right. This question was helpful but I'm still missing something...

Here's my data:

vertices.csv:

label,data,date
v01,0.1234,2015-01-01
v02,0.5678,2015-01-02
v03,0.9012,2015-01-03

edges.csv:

u,v,weight,date
v01,v02,12.4,2015-06-17
v02,v03,17.9,2015-09-14

I import my vertices using this:

commonVertices.json:

{
"begin": [ 
             { "let": { "name":       "$filePath",  
                        "expression": "$fileDirectory.append($fileName)" 
                      } 
             },
         ],
"config": { "log": "info"},
"source": { "file": { "path": "$filePath" } },
"extractor": { "csv": { "ignoreEmptyLines": true,
                        "nullValue": "N/A",
                        "dateFormat": "yyyy-mm-dd"
                      }
             },
"transformers": [
                    { "vertex": { "class": "myVertex" } },
                    { "code":   { "language": "Javascript",
                                  "code":     "print('    Current record: ' + record); record;" }
                    }
                ],
"loader": { "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ { "name": "myVertex", "extends", "V" },
                       ],
            "indexes": []
            }
          }
}

vertices.json:

{ "config": { "log":           "info",
              "fileDirectory": "./",
              "fileName":      "vertices.csv"
            }
}

commonEdges.json:

{
    "begin": [
        { "let": { "name": "$filePath",
                   "expression": "$fileDirectory.append($fileName )"
                 }
        },
    ],

    "config": { "log": "info"
              },

    "source": { "file": { "path": "$filePath" } },

    "extractor": { "csv": { "ignoreEmptyLines": true,
                            "nullValue": "N/A",
                            "dateFormat": "yyyy-mm-dd"
                          }
                 },

    "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }
            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

    "loader": {
        "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "useLightweightEdges": false,
            "classes": [
                { "name": "myEdge",   "extends", "E" }
            ],
            "indexes": []
        }
    }
}

edges.json:

{
    "config": {
        "log": "info",
        "fileDirectory": "./",
        "fileName": "edges.csv"
    }
}

I am running it with oetl.sh like this:

$ oetl.sh vertices.json commonVertices.json
$ oetl.sh edges.json commonEdges.json

Everything runs, but when I query the edges... I'm new to OrientDB, so maybe it is getting the properties in my edges, but when I query the edges I don't see the weight and date fields:

orientdb {db=my_orientdb}> SELECT FROM myEdge
+----+-----+------+-----+-----+
|#   |@RID |@CLASS|out  |in   |
+----+-----+------+-----+-----+
|0   |#33:0|myEdge|#25:0|#26:0|
|1   |#34:0|myEdge|#26:0|#27:0|
+----+-----+------+-----+-----+

The vertex table contains the [weight] field from my edges.csv and the [date] field is getting clobbered in a weird way. The day of the month is getting overwritten to the day from the edge.csv file, which is undesirable, but it's odd to me that the month itself isn't also getting change:

orientdb {db=my_orientdb}> SELECT FROM myVertex
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|#   |@RID |@CLASS  |data  |date               |label|weight|out_myEdge|in_myEdge|
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|0   |#25:0|myVertex|0.1234|2015-01-17 00:06:00|v01  |12.4  |[#33:0]   |         |
|1   |#26:0|myVertex|0.5678|2015-01-14 00:09:00|v02  |17.9  |[#34:0]   |[#33:0]  |
|2   |#27:0|myVertex|0.9012|2015-01-03 00:01:00|v03  |      |          |[#34:0]  |
+----+-----+--------+------+-------------------+-----+------+----------+---------+

I'm sure it's probably a simple tweak, any help would be great!


回答1:


In edge transformer use edgeFields to bind properties in edges. Example:

 "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "edgeFields": { "weight": "${input.weight}", "date": "${input.date}" },
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }

            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

Hope it helps.



来源:https://stackoverflow.com/questions/38628356/orientdb-etl-loading-csv-with-vertices-in-one-file-and-edges-in-another

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!