Find all events between 2 dates

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-18 13:38:01

问题


Trying to figure out the following scenario.

Setup:

  • Used the suggestions in this post to create a Time Tree http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/
  • Extended it to go to the Hour and Minutes (for the sake of simplicity, using 0,15,30,45
  • Connected the Minutes with :NEXT (similar to what's done on days, but on minutes)
  • Created several events attached to the minutes with "STARTS_AT" and "ENDS_AT"

Goals

  • Find all events occurring between two dates
  • Find conflicting events (events that overlap)

Attempted the following query:

// Get all events between 2 times
MATCH (m1:Minute { minute: 15 })<--(h:Hour { hour: 8 })<--(d:Day { day: 24 })<--(month:Month { month: 10 })
MATCH (m2:Minute { minute: 45 })<--(h2:Hour { hour: 10 })<--(d2:Day { day: 24 })<--(month2:Month { month: 10 })
MATCH p=((m1)-[:NEXT*]->(m2))
WITH NODES(p) as pnodes UNWIND pnodes AS pwind
MATCH (e:Event)-[:STARTS_AT]->(pwind)
RETURN pwind,e

The results are indeed being retrieved, but noticed that:

  1. If I try to use "shortestpath" instead of regular path, the query only works for a certain threshold (bizarre, can't explain this). For example, if the timespan is more than 2-3 hours it doesn't work; same applies to multiple days, etc.
  2. Without using "shortestpath", the performance is TERRIBLE. 25 seconds to return only 2-3 items.

Another variation using where (tried it only for future dates):

// Get all events between 2 times
MATCH (e:Event)
WHERE (:Month { month: 10 })-->(:Day { day: 24 })-->(:Hour { hour: 9 })-->(:Minute { minute: 00})-[:NEXT*]->(:Minute)<--(e)
RETURN e

Results: the performance is even WORSE. 100 seconds to retrieve 1 item.

The way I understand and would like to do this, is by using some sort of function that allows the path to return related nodes. This is: path function returns only the specific node being queried (in this case Minutes), but I would like to bring for ALL THOSE MINUTES, the Events associated by ":STARTS_AT".

Finally, the questions:

  • What's the recommended way to perform this query?
  • Is this a scenario that is supported by Cypher and neo4j?
  • Would it be preferrable to "fall back" to property based time querying instead of trying to attempt graph based time queries?

Thanks in advance.


回答1:


So there's this weird thing with shortestPath where if you don't specify a maximum length, it arbitrarily sets the max to 15. See here:

ShortestPath doesn't find any path without max hops limit

I would actually call this a bug, because it's not in the documentation and it leads to unexpected behavior, as you've found.

So the solution to your problem is to use shortestPath but pick a maximum length. I'd choose something really high; let's do a billion and call it a day:

MATCH (:Year {year:2015})-[:HAS_MONTH]->(:Month {month:10})-[:HAS_DAY]->(:Day {day:23})-[:HAS_HOUR]->(:Hour {hour:8})-[:HAS_MINUTE]->(startMinute:Minute {minute:15})
MATCH (:Year {year:2015})-[:HAS_MONTH]->(:Month {month:10})-[:HAS_DAY]->(:Day {day:24})-[:HAS_HOUR]->(:Hour {hour:10})-[:HAS_MINUTE]->(endMinute:Minute {minute:45})
MATCH p = shortestPath((startMinute)-[:NEXT*..1000000000]->(endMinute))
UNWIND NODES(p) AS minute
MATCH (event:Event)-[:STARTS_AT]->(minute)
RETURN event, minute;

You should always use shortestPath for finding the span of minute nodes; matching on (startMinute)-[:NEXT*]->(endMinute) without wrapping it in shortestPath is trying to find all paths of any length between the two nodes, so it's exhaustive and takes much longer, whereas shortestPath can stop as soon as it's found the path.

As far as finding if any other events overlap with a certain event:

MATCH (startMinute:Minute)<-[:STARTS_AT]-(event:Event)-[:ENDS_AT]->(endMinute:Minute)
WHERE event.id = {event_id}
MATCH p = shortestPath((startMinute)-[:NEXT*..1000000000]->(endMinute))
UNWIND NODES(p) AS span
MATCH (overlap:Event)-[:STARTS_AT|ENDS_AT]->(span)
WHERE overlap <> event
RETURN overlap;

Below is an appendix of how the data was created for proof-of-concept purposes. Assume all months have 31 days.

Constraints and indexes.

CREATE CONSTRAINT ON (year:Year) ASSERT year.year IS UNIQUE;
CREATE INDEX ON :Month(month);
CREATE INDEX ON :Day(day);
CREATE INDEX ON :Hour(hour);
CREATE INDEX ON :Minute(minute);

Create the time tree.

WITH RANGE(2014, 2015) AS years, RANGE(1, 12) AS months, RANGE(1, 31) AS days, RANGE(0,23) AS hours, RANGE(0, 45, 15) AS minutes
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN days |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d)
      FOREACH(hour IN hours |
        CREATE (h:Hour {hour: hour})
        MERGE (d)-[:HAS_HOUR]->(h)
        FOREACH(minute IN minutes |
          CREATE (min:Minute {minute: minute})
          MERGE (h)-[:HAS_MINUTE]->(min)
        )
      )
    )
  )
);

Create [:NEXT] relationships between all the Minute nodes.

MATCH (year:Year)-[:HAS_MONTH]->(month:Month)-[:HAS_DAY]->(day:Day)-[:HAS_HOUR]->(hour:Hour)-[:HAS_MINUTE]->(minute:Minute)
WITH year, month, day, hour, minute
ORDER BY year.year, month.month, day.day, hour.hour, minute.minute
WITH COLLECT(minute) AS minutes
FOREACH(i IN RANGE(0, LENGTH(minutes) - 2) | 
    FOREACH(min1 IN [minutes[i]] | 
        FOREACH(min2 IN [minutes[i + 1]] | 
            CREATE UNIQUE (min1)-[:NEXT]->(min2)
        )
    )
);

Randomly simulate events and their start times.

MATCH (minute:Minute)
WHERE RAND() < 0.3
CREATE (event:Event)-[:STARTS_AT]->(minute);

Make all events 5 minute blocks long.

MATCH (event:Event)-[:STARTS_AT]->(startMinute:Minute)-[:NEXT*5]->(endMinute:Minute)
CREATE (event)-[:ENDS_AT]->(endMinute);


来源:https://stackoverflow.com/questions/32685803/find-all-events-between-2-dates

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!