问题
I have a tree I store in Neo4j, in the tree, each node contains an identifier and a counter. I want to be able to increment that counter in a fast way, for a bunch of nodes.
So I used MERGE (with ON CREATE SET and ON MATCH SET), however the performance is quite poor. And it seems that if I do it with two transactions, one to know if each node exists and another to create it or update it, it is faster.
Do you know why MERGE is slower than a MATCH and CREATE combined ? And how can I improve its performance ?
Here is a relevant example you can reproduce :
import datetime
from py2neo import Graph
def bench(query, count, reset=True):
graph = Graph()
if reset:
graph.cypher.run("MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r")
graph.cypher.run("CREATE CONSTRAINT ON (node:Node) ASSERT node.number IS UNIQUE")
graph.cypher.run("CREATE (r:Root)")
start = datetime.datetime.now()
tx = graph.cypher.begin()
for i in range(count):
tx.append(query, {'i': i})
tx.commit()
print('---')
print('query : %s' % query)
print("%i create. %s/second." % (count, count // (datetime.datetime.now() - start).total_seconds()))
if __name__ == '__main__':
bench("MATCH (:Root)-[:Child]->(n:Node {number: {i}}) RETURN n.count", 1000)
bench("CREATE (:Root)-[:Child]->(:Node {number: {i}, count: 1})", 1000)
bench("MATCH (root:Root) CREATE (root)-[:Child]->(:Node {number: {i}, count: 1})", 1000)
bench("MATCH (root:Root) MERGE (root)-[:Child]->(n:Node {number: {i}}) ON CREATE SET n.count = 1 ON MATCH SET n.count = n.count + 1", 1000)
bench("MATCH (root:Root)-[:Child]->(n:Node {number: {i}}) SET n.count = n.count + 1", 1000)
bench("MATCH (root:Root) CREATE UNIQUE (root)-[:Child]->(n:Node {number: {i}}) SET n.count = coalesce(n.count, 0) + 1", 1000)
And the output of that code :
---
query : MATCH (:Root)-[:Child]->(n:Node {number: {i}}) RETURN n.count
1000 create. 1151.0/second.
---
query : CREATE (:Root)-[:Child]->(:Node {number: {i}, count: 1})
1000 create. 760.0/second.
---
query : MATCH (root:Root) CREATE (root)-[:Child]->(:Node {number: {i}, count: 1})
1000 create. 1092.0/second.
---
query : MATCH (root:Root) MERGE (root)-[:Child]->(n:Node {number: {i}}) ON CREATE SET n.count = 1 ON MATCH SET n.count = n.count + 1
1000 create. 218.0/second.
---
query : MATCH (root:Root)-[:Child]->(n:Node {number: {i}}) SET n.count = n.count + 1
1000 create. 3005.0/second.
---
query : MATCH (root:Root) CREATE UNIQUE (root)-[:Child]->(n:Node {number: {i}}) SET n.count = coalesce(n.count, 0) + 1
1000 create. 283.0/second.
Thanks for your help :)
回答1:
Both MERGE and CREATE UNIQUE have to check for the relationship and the end-node first and then create.
With MERGE you would be faster if you'd created the child node first and then merged on the relationship.
Your variant of merge with only one bound node will always create a new child node!!
try this:
MATCH (root:Root)
MERGE (n:Node {number: {i}})
ON CREATE SET n.count = 1 ON MATCH SET n.count = n.count + 1
MERGE (root)-[:Child]->(n)
or this
MATCH (root:Root)
MERGE (n:Node {number: {i}})
ON CREATE SET n.count = 1 ON MATCH SET n.count = n.count + 1
CREATE UNIQUE (root)-[:Child]->(n)
回答2:
You can have a closer look at your queries with PROFILE
in the web interface or via the neo4j-shell: http://neo4j.com/docs/stable/how-do-i-profile-a-query.html
This might help to see why the MERGE
is so slow.
PROFILE MATCH (root:Root) MERGE (root)-[:Child]->(n:Node {number: 1})
ON CREATE SET n.count = 1 ON MATCH SET n.count = n.count + 1
It would be interesting to see if and why MERGE
only with ON MATCH
is slower than MATCH ... SET
. Maybe cypher uses indexes differently for both queries, PROFILE
also tells you about this.
来源:https://stackoverflow.com/questions/30403504/neo4j-merge-performance-vs-create-set