问题
I'm trying to run a query over 582479 genes using the OR operator, after creating index on the properties: symbol, primaryidentifier, secondaryidentifier and name. This the query:
PROFILE
MATCH(g:Gene) WHERE g.symbol="CG11566" OR
g.primaryidentifier="CG11566" OR
g.secondaryidentifier="CG11566" OR
g.name="CG11566"
RETURN g.id, g.primaryidentifier, g.secondaryidentifier, g.symbol, g.name
ORDER BY g.id;
The performance is very poor, the indexes created are not used but only the label scan-> 2912399 total db hits in 3253 ms
Changed the query to use UNION:
PROFILE
MATCH(g:Gene) WHERE g.symbol='CG11566' return g.id
UNION MATCH(g:Gene) WHERE g.primaryidentifier='CG11566' return g.id
UNION MATCH(g:Gene) WHERE g.secondaryidentifier='CG11566' return g.id
UNION MATCH(g:Gene) WHERE g.name='CG11566' return g.id;
indexes have been used -> 8 total db hits in 73 ms. Much better. Any better way to implement the query without using UNION?
回答1:
There is not much else you can do right now, Cypher's planner would have to get cleverer
The UNION is imho the best solution right now.
回答2:
neo4j 3.2 has introduced the use of indexes with the OR operator. Great!
回答3:
You could split up the query into 4 parts (one for each condition) and collect all results into one array that's being unwinded in the last step:
MATCH (g1:Gene{symbol:'CG11566'})
WITH collect(g1) as c1
MATCH (g2:Gene{primaryidentifier:'CG11566'})
WITH c1 + collect(g2) as c2
MATCH (g3:Gene{secondaryidentifier:'CG11566'})
WITH c2 + collect(g3) as c3
MATCH (g4:Gene{name:'CG11566'})
WITH c3 + collect(g4) as c4
UNWIND c4 as gene
... do stuff with genes found by any of the 4 parts
来源:https://stackoverflow.com/questions/37418207/bad-performance-with-or-operator