Optimize Neo4j Cypher query

问题

What I'm doing is to get all profiles* who has a specific directed relation to a users profile* and if those have an alternate profile* get those in case the users alternate profile* has a relation to it. I also need the direction of the relations.

My problem is, with about 10000 nodes it takes about 5 seconds to get data. I have auto index on nodes and relationships.

This is how my nodes are related:

User-[:profile]->ProfileA-[:related]->ProfileB<-[?:me]->ProfileB2<-[?:related]-ProfileA2<-[:profile]-User

My query looks like this:

START User=node({source}) 
MATCH User-[:profile]->ProfileA-[rel:related]->ProfileB 
WHERE User-->ProfileA-->ProfileB 
WITH ProfileA, rel, ProfileB 
MATCH ProfileB<-[?:me]->ProfileB2<-[relB?:related]-ProfileA2<-[:profile]-User 
WHERE relB IS NULL OR User-->ProfileA-->ProfileB<-->ProfileB2<--ProfileA2<--User
RETURN ProfileB, COLLECT(ProfileB2), rel, relB
LIMIT 25

Any idea how I can optimize the query?

_{profiles: ProfileB}
_{users profile: ProfileA}
_{alternate profile: ProfileB2}
_{users alternate profile: ProfileA2}

回答1:

You're using WHERE clauses where you don't need to. Let's look at the first one for example:

WHERE User-->ProfileA-->ProfileB

This clause says "restrict the results only to users that have a relationship to a ProfileA which itself has a relationship to a ProfileB". However, that is already guaranteed to be true by your match clause. You're wasting CPU cycles re-verifying something that is already true.

WITH ProfileA, rel, ProfileB

You aren't doing any sort of aggregation, calculation or reassignment, so there is no need for this WITH clause. You can continue on without it.

WHERE relB IS NULL OR User-->ProfileA-->ProfileB<-->ProfileB2<--ProfileA2<--User

Again, you're not getting any value out of this WHERE clause. This one says "restrict the results to paths where a relB wasn't found OR where one was found with the following path..." and then you list the exact same path that was in your MATCH.

So, remove all those extraneous clauses and you get this:

START User=node({source}) 
MATCH User-[:profile]->ProfileA-[rel:related]->ProfileB<-[?:me]->ProfileB2<-[relB?:related]-ProfileA2<-[:profile]-User 
RETURN ProfileB, COLLECT(ProfileB2), rel, relB
LIMIT 25

Try that and see if the performance is any better. If it's not enough then you may need to add more information to your question -- for my own part, I don't fully understand what your relationships actually mean (for example, what is the "me" relationship? what does it symbolize?)

回答2:

This is how I solved it:

START User=node({source}) 
MATCH User-[:profile]->ProfileA-[rel:related]->ProfileB<-[?:me]->ProfileB2-[relB?:related]-ProfileA2
WHERE relB IS NULL OR User-[:profile]->ProfileA2
RETURN ProfileB, COLLECT(ProfileB2), rel, relB
LIMIT 25

The ProfileA2<-[:profile]-User seemed to produce an endless loop.

Recommendations are still welcome.

来源：https://stackoverflow.com/questions/15146883/optimize-neo4j-cypher-query

标签

neo4j

cypher