Slow performance bulk updating relationship properties in Neo4j

孤者浪人 提交于 2019-12-10 09:33:30

问题


I'm struggling to efficiently bulk update relationship properties in Neo4j. The objective is to update ~ 500,000 relationships (each with roughly 3 properties) which I chunk into batches of 1,000 and processing in a single Cypher statement,

UNWIND {rows} AS row
MATCH (s:Entity) WHERE s.uuid = row.source
MATCH (t:Entity) WHERE t.uuid = row.target
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties

however each batch of 1,000 nodes takes around 60 seconds. There exists an index on UUID property for the :Entity label, i.e. I've previously run,

CREATE INDEX ON :Entity(uuid)

which means that matching the relationship is super efficient per the query plan,

There's 6 total db hits and the query executes in ~ 150 ms. I've also added a uniqueness constraint on the UUID property which ensures that each match only returns one element,

CREATE CONSTRAINT ON (n:Entity) ASSERT n.uuid IS UNIQUE

Does anyone know how I can further debug this to understand why it's taking Neo4j so long to process the relationships?

Note that I'm using similar logic for updating nodes which is orders of magnitude faster which have significant more metadata associated with them.

For reference I'm using Neo4j 3.0.3, py2neo, and Bolt. The Python code block is of the form,

for chunk in chunker(relationships): # 1,000 relationships per chunk
    with graph.begin() as tx:
        statement = """
            UNWIND {rows} AS row
            MATCH (s:Entity) WHERE s.uuid = row.source
            MATCH (t:Entity) WHERE t.uuid = row.target
            MATCH (s)-[r:CONSUMED]->(t)
            SET r += row.properties
            """

            rows = []

            for rel in chunk:
                rows.append({
                    'properties': dict(rel),
                    'source': rel.start_node()['uuid'],
                    'target': rel.end_node()['uuid'],
                })

            tx.run(statement, rows=rows)

回答1:


Try this query:

UNWIND {rows} AS row
WITH row.source as source, row.target as target, row
MATCH (s:Entity {uuid:source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)
SET r += row.properties;

It uses index hints to force an index lookup for both Entity nodes and then an Expand(Into) operator which should be more performant than the Expand(All) and Filter operators shown by your query plan.




回答2:


@william-lyon I was wondering whether I need the WITH * WHERE true clause? The reason I ask is that the number of DB hits increases from 4 to 8, i.e.

PROFILE
MATCH (s:Entity {uuid:row.source})
USING INDEX s:Entity(uuid)
MATCH (t:Entity {uuid:row.target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)

returns

whereas

PROFILE
MATCH (s:Entity {uuid:row.source})
USING INDEX s:Entity(uuid)
WITH * WHERE true
MATCH (t:Entity {uuid:row.target})
USING INDEX t:Entity(uuid)
MATCH (s)-[r:CONSUMED]->(t)

returns

Note that the using the index hints reduces the number of DB hits from 6 down to 4. For context we have multiple node labels (and indexes) though every node has the :Entity label.



来源:https://stackoverflow.com/questions/40725765/slow-performance-bulk-updating-relationship-properties-in-neo4j

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!