Neo4j how to model a time-versioned graph

后端 未结 2 891
南方客
南方客 2020-12-29 11:48

Part of my graph has the following schema:

Main part of the graph is the domain, that has some persons linked to it. Person has a unique constraint on the e

2条回答
  •  旧巷少年郎
    2020-12-29 12:17

    This answer is based on Ian Robinson's post about time-based versioned graphs.

    I don't know if this answer covers ALL the requirements of the question, but I believe that can provide some insights.

    Also, I'm considering you are only interested in structural versioning (that is: you are not interested in queries about the changes of the domain user's name over the time). Finally, I'm using a partial representation of your graph model, but I believe that the concepts shown here can be applied in the whole graph.

    The initial graph state:

    Considering this Cypher to create an initial graph state:

    CREATE (admin:Admin)
    
    CREATE (person1:Person {person_id : 1})
    CREATE (person2:Person {person_id : 2})
    CREATE (person3:Person {person_id : 3})
    
    CREATE (domain1:Domain {domain_id : 1})
    
    CREATE (device1:Device {device_id : 1})
    
    CREATE (person1)-[:ADMIN {from : 0, to : 1000}]->(admin)
    
    CREATE (person1)-[:CONNECTED_DEVICE {from : 0, to : 1000}]->(device1)
    
    CREATE (domain1)-[:MEMBER]->(person1)
    CREATE (domain1)-[:MEMBER]->(person2)
    CREATE (domain1)-[:MEMBER]->(person3)
    

    Result:

    The above graph has 3 person nodes. These nodes are members of a domain node. The person node with person_id = 1 is connected to a device with device_id = 1. Also, person_id = 1 is the current administrator. The properties from and to inside the :ADMIN and :CONNECTED_DEVICE relationships are used to manage the history of the graph structure. from is representing a start point in time and to an end point in time. For simplification purpose I'm using 0 as the initial time of the graph and 1000 as the end-of-time constant. In a real world graph the current time in milliseconds can be used to represent time points. Also, Long.MAX_VALUE can be used instead as the EOT constant. A relationship with to = 1000 means there is no current upper bound to the period associated with it.

    Queries:

    With this graph, to get the current administrator I can do:

    MATCH (person:Person)-[:ADMIN {to:1000}]->(:Admin)
    RETURN person
    

    The result will be:

    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":1}│
    └───────────────┘
    

    Given a device, to get the current connected user:

    MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
    RETURN person
    

    Resulting:

    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":1}│
    └───────────────┘
    

    To query the current administrator and the current person connected to a device the End-Of-Time constant is used.

    Query the device connect / disconnect events:

    MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
    RETURN person AS person, device AS device, r.from AS from, r.to AS to
    ORDER BY r.from
    

    Resulting:

    ╒═══════════════╤═══════════════╤══════╤════╕
    │"person"       │"device"       │"from"│"to"│
    ╞═══════════════╪═══════════════╪══════╪════╡
    │{"person_id":1}│{"device_id":1}│0     │1000│
    └───────────────┴───────────────┴──────┴────┘
    

    The above result shows that person_id = 1 is connected to device_id = 1 of the beginning until today.

    Changing the graph structure

    Consider that the current time point is 30. Now user_id = 1 is disconnecting from device_id = 1. user_id = 2 will connect to it. To represent this structural change, I will run the below query:

    // Get the current connected person
    MATCH (person1:Person)-[old:CONNECTED_DEVICE {to : 1000}]->(device:Device {device_id : 1})
    // get person_id = 2
    MATCH (person2:Person {person_id : 2}) 
     // set 30 as the end time of the connection between person_id = 1 and device_id = 1
    SET old.to = 30
    // set person_id = 2 as the current connected user to device_id = 1
    // (from time point 31 to now)
    CREATE (person2)-[:CONNECTED_DEVICE {from : 31, to: 1000}]->(device) 
    

    The resultant graph will be:

    After this structural change, the connection history of device_id = 1 will be:

    MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
    RETURN person AS person, device AS device, r.from AS from, r.to AS to
    ORDER BY r.from
    
    ╒═══════════════╤═══════════════╤══════╤════╕
    │"person"       │"device"       │"from"│"to"│
    ╞═══════════════╪═══════════════╪══════╪════╡
    │{"person_id":1}│{"device_id":1}│0     │30  │
    ├───────────────┼───────────────┼──────┼────┤
    │{"person_id":2}│{"device_id":1}│31    │1000│
    └───────────────┴───────────────┴──────┴────┘
    

    The above result shows that user_id = 1 was connected to device_id = 1 from 0 to 30 time. person_id = 2 is currently connected to device_id = 1.

    Now the current person connected to device_id = 1 is person_id = 2:

    MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
    RETURN person
    
    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":2}│
    └───────────────┘
    

    The same approach can be applied to manage the admin history.

    Obviously this approach has some downsides:

    • Need to manage a set of extra relationships
    • More expensive queries
    • More complex queries

    But if you really need a versioning schema I believe this approach is a good option or (at least) a good start point.

提交回复
热议问题