Bad performance with OR operator

半腔热情 提交于 2019-12-22 05:21:46

问题


I'm trying to run a query over 582479 genes using the OR operator, after creating index on the properties: symbol, primaryidentifier, secondaryidentifier and name. This the query:

PROFILE 
MATCH(g:Gene) WHERE g.symbol="CG11566" OR 
                    g.primaryidentifier="CG11566" OR
                    g.secondaryidentifier="CG11566" OR 
                    g.name="CG11566" 
RETURN g.id, g.primaryidentifier, g.secondaryidentifier, g.symbol, g.name
ORDER BY g.id;

The performance is very poor, the indexes created are not used but only the label scan-> 2912399 total db hits in 3253 ms

Changed the query to use UNION:

PROFILE 
      MATCH(g:Gene) WHERE g.symbol='CG11566' return g.id 
UNION MATCH(g:Gene) WHERE g.primaryidentifier='CG11566' return g.id 
UNION MATCH(g:Gene) WHERE g.secondaryidentifier='CG11566' return g.id 
UNION MATCH(g:Gene) WHERE g.name='CG11566' return g.id;

indexes have been used -> 8 total db hits in 73 ms. Much better. Any better way to implement the query without using UNION?


回答1:


There is not much else you can do right now, Cypher's planner would have to get cleverer

The UNION is imho the best solution right now.




回答2:


neo4j 3.2 has introduced the use of indexes with the OR operator. Great!




回答3:


You could split up the query into 4 parts (one for each condition) and collect all results into one array that's being unwinded in the last step:

MATCH (g1:Gene{symbol:'CG11566'})  
WITH collect(g1) as c1 
MATCH (g2:Gene{primaryidentifier:'CG11566'}) 
WITH c1 + collect(g2) as c2
MATCH (g3:Gene{secondaryidentifier:'CG11566'})
WITH c2 + collect(g3) as c3
MATCH (g4:Gene{name:'CG11566'})
WITH c3 + collect(g4) as c4
UNWIND c4 as gene
... do stuff with genes found by any of the 4 parts


来源:https://stackoverflow.com/questions/37418207/bad-performance-with-or-operator

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!