问题
I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.
I made a query, which doesn't work quite as intended.
MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;
This query first finds the nodes A
, B
, C
and then searches for tags, that are related to A
, B
or C
via 1 node or less.
What I want to do though is to find tags, which are related to ALL three nodes (A
, B
and C
).
I know I could concatenate MATCH
and WITH
statements, and do something like this:
MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;
But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A
, B
, C
).
So is there a way to make it in one query, similar to my first try?
回答1:
Here is a solution that only requires a single MATCH
clause.
MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
- The query finds all the tags (
other
) that are "related" (by up to 2 steps) to each of your named tags (t
). - It then uses aggregation to collect the distinct
other
nodes for eacht
. In this example, we end up with 3others
collections -- 1 for eacht
. - It then collects all the
others
collections into a singlecoll
collection. - Finally, since the result set is supposed to be the intersection of every
others
collection, the query walks through the nodes in firstothers
collection, and extracts the ones that are also in the remainingothers
collections. And, since eachothers
collection already contains distinct nodes, the result must also have distinct nodes.
In addition, if you have a lot of tags, the above query can be made a bit faster by:
- Creating an index (or uniqueness constraint, which automatically creates an index for you) on
:Tag(name)
, and then Specifying the use of that index in your query -- by inserting the following clause between the
MATCH
andWHERE
clauses. Currently, the Cypher engine does not automatically use the index for this specific query.USING INDEX t:Tag(name)
回答2:
Here is an alternative:
MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times
It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.
回答3:
How about this one:
MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t
The trick is put all the related nodes for t
into a collection (others) and use the ALL
predicate to make sure all of your A,B and C are part of that.
来源:https://stackoverflow.com/questions/33018818/neo4j-match-nodes-related-to-all-nodes-in-collection