Really slow load speed Neo4jClient C# LoadCsv

回眸只為那壹抹淺笑 提交于 2019-12-11 08:42:42

问题


The code I use now is really slow with about 20 inserts per second and uses a splitter to create multiple csv files to load. Is there a way to use "USING PERIODIC COMMIT 1000" in a proper way using the Neo4jClient for dotnet?

    public async Task InsertEdgesByName(List<string> nodeListA, List<string> nodeListB,
        List<int> weightList, string type)
    {
        for (var i = 0; i < nodeListA.Count; i += 200)
        {
            using (var sw = new StreamWriter(File.OpenWrite($"tempEdge-{type}.csv")))
            {
                sw.Write("From,To,Weight\n");
                for (var j = i; 
                    j < i + 200 & 
                    j < nodeListA.Count; 
                    j++)
                {
                    sw.Write($"{nodeListA[j]}," +
                             $"{nodeListB[j]}," +
                             $"{weightList[j]} + id:{j}" +
                             $"\n");
                }
            }
            var f = new FileInfo($"tempEdge-{type}.csv");

            await Client.Cypher
                .LoadCsv(new Uri("file://" + f.FullName), "rels", true)
                .Match("(from {label: rels.From}), (to {label: rels.To})")
                .Create($"(from)-[:{type} {{weight: rels.Weight}}]->(to);")
                .ExecuteWithoutResultsAsync();

            _logger.LogDebug($"{DateTime.Now}\tEdges inserted\t\tedges inserted: {i}");
        }
    }

To create the nodes I use

        await Client.Cypher
            .Create("INDEX ON :Node(label);")
            .ExecuteWithoutResultsAsync();

        await Client.Cypher
            .LoadCsv(new Uri("file://" + f.FullName), "csvNode", true)
            .Create("(n:Node {label:csvNode.label, source:csvNode.source})")
            .ExecuteWithoutResultsAsync();

The indexing on label does not seem to change the speed of either insert statement. I have about 200.000 edges to insert, at 20 per second this would take hours. Being able to add the USING PERIODIC COMMIT 1000 would clean up my code but wouldn't improve performance by much.

Is there a way to speed up inserts? I know the neo4jclient is not the fastest but I would really like to stay within the asp.net environment.

SimpleNode class

public class SimpleNodeModel
{
    public long id { get; set; }
    public string label { get; set; }
    public string source { get; set; } = "";

    public override string ToString()
    {
        return $"label: {label}, source: {source}, id: {id}";
    }

    public SimpleNodeModel(string label, string source)
    {
        this.label = label;
        this.source = source;
    }

    public SimpleNodeModel() { }

    public static string Header => "label,source";

    public string ToCSVWithoutID()
    {
        return $"{label},{source}";
    }
}

Cypher code

USING PERIODIC COMMIT 500
LOAD CSV FROM 'file://F:/edge.csv' AS rels
MATCH (from {label: rels.From}), (to {label: rels.To})
CREATE (from)-[:edge {{weight: rels.Weight}}]->(to);

回答1:


Regarding the slow speed of the Cypher code at the bottom, that's because you're not using labels in your MATCH, so your MATCH never uses the index to find the nodes quickly, it instead must scan every node in your database TWICE, once for from, and again for to.

Your use of label in the node properties is not the same as the node label. Since you created the nodes with the :Node label, please reuse this label in your match:

...
MATCH (from:Node {label: rels.FROM}), (to:Node {label: rels.To})
...



回答2:


Period commit isn't supported in Neo4jClient in the version you're using. I've just committed a change that will be published shortly (2.0.0.7) which you can then use:

.LoadCsv(new Uri("file://" + f.FullName), "rels", true, periodicCommit:1000)

which will generate the correct cypher.

It's on its way, and should be 5 mins or so depending on indexing time for nuget.



来源:https://stackoverflow.com/questions/41763999/really-slow-load-speed-neo4jclient-c-sharp-loadcsv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!