Why do I get a “Cartesian Product” warning?

六眼飞鱼酱① 提交于 2019-12-05 06:38:10

If you are MATCHing on two different labels without any relationships between them, then you'll get this warning. The reason is because if you do:

MATCH (a:Foo), (b:Bar)

It's Neo4j's job to find every possible combination of those two nodes. So for the first match of a it will return a row for every match of b, for the second match of a it will again return a row for every match of b, and so on. So you'll get (number of Foo nodes) x (number of Bar nodes) total rows in your result. As your database grows this is really bad for performance.

I can see that you're filtering on version for Form and text for Question, so that would help. That may even give you just one Form node and one Question node. So as long as you have an index on the Form(version) and Question(text) the query should be quite quick. Neo4j can't tell (or at least, isn't currently implemented to be able to tell) how many rows are going to be returned, so it gives a warning saying that your query could be potentially slow.

They are all cartesian

Having read your question, for a second there, my cypher-world imploded - all three queries should involve a cartesian product.

Having checked (on both the console and a local DB - both version 3.3.0), turns out I'm sane - they do all involve a cartesian product:

Why there is only a warning in the first case (still in version 3.3.0 I have no clue - you simply need to run the planner to figure this out, and if this isn't firing the warning what does? Some dumb cypher logic?

Cypher basics

Cypher queries are made of parts, each can be either update (write) or read.

As far as read parts are concerned, this is what happens:

  • Neo4j picks a starting point which it reckons will yield the least 'hits'. It will go through each of these hits...
  • ...traversing the graph using the node/relationship pattern.
  • It does so repeatedly until no more patterns are matched.

If you have something like this:

(a {name:'Bill'})-->(b:Dog)

The plan might look something like this.

  • For each node (AKA AllNodeScan):
    • Filter based on the predicate (name == 'bill')
    • Get all outgoing --> relationships
    • For each relationship:
      • Get the end node
      • Filter based on predicate (:Dog)

The important thing is that whilst to find (a) we need to scan each node. But we simply traverse the graph to find the (b)s - no AllNodeScan for the latter.

(There are variants of AllNodeScan, see Starting Node Operators)

When your query is something like this:

MATCH (f:Form {version: "1.0"}), (q:Question {text: "Sector de la empresa"})

Neo is forced to do an AllNodeScan for both f and q - There is no pattern to traverse between them. This can potentially create a result set of an f * q size, which could be huge.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!