Getting top n records for each group in neo4j

问题

I need to group the data from a neo4j database and then to filter out everything except the top n records of every group.

Example:

I have two node types : Order and Article. Between them there is an "ADDED" relationship. "ADDED" relationship has a timestamp property. What I want to know (for every article) is how many times it was among the first two articles added to an order. What I tried is the following approach:

get all the Order-[ADDED]-Article
sort the result from step 1 by order id as first sorting key and then by timestamp of ADDED relationship as second sorting key;
for every subgroup from step 2 representing one order, keep only the top 2 rows;
Count distinct article ids in the output of step 3;

My problem is that I got stuck at step 3. Is it possible to get top 2 rows for every subgroup representing an order?

Thanks,

Tiberiu

回答1:

Try

MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count

Results look like

articleId    count
   8           6
   2           2
   4           5
   7           2
   3           3
   6           5
   0           7

on this sample graph created with

FOREACH(opar IN RANGE(1,15) |
    MERGE (o:Order {oid:opar})
    FOREACH(apar IN RANGE(1,5) |
        MERGE (a:Article {aid:TOINT(RAND()*10)})
        CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
    )
)

回答2:

Use LIMIT combined with ORDER BY to get the top N of anything. For example, the top 5 scores would be:

MATCH (node:MyScoreNode) 
RETURN node
ORDER BY node.score DESC
LIMIT 5;

The ORDER BY part ensures the highest scores show up first. The LIMIT gives you only the first 5, which since they're sorted, are always the highest.

回答3:

I tried to achieve your desired results and failed.

So, my guess - this one is impossible with pure cypher.

What is the problem? Cypher is considering everything as a paths. And actually is doing traverse.
Trying to group results and then execute filter on each group means that cypher should somehow branch it traversing at some points. But Cypher executed filter on all results, because they are considered as collection of different paths.

My suggestion - create several queries, that achieves desired functionality, and implement some client-side logic.

来源：https://stackoverflow.com/questions/32951651/getting-top-n-records-for-each-group-in-neo4j

标签

neo4j