Compare many date ranges in google app engine datastore (Many to many, Python)

耗尽温柔 提交于 2019-12-23 01:57:11

问题


I have two data sets in google app engine datastore.

class First_Set(db.Model):
  start_time = db.DateTimeProperty()
  end_time = db.DateTimeProperty()
  data1 = db.FloatProperty()
  ...

class Second_Set(db.Model):
  start_time = db.DateTimeProperty()
  end_time = db.DateTimeProperty()
  data2 = db.FloatProperty()
  ...

(They have other different data that's why they're in different datasets.)

I'd like to find the datastore IDs all the overlapping start_time and end_time across two datasets, ideally without pulling results from one and iterating the first results over the other.

A great visualization of the initial dataset is from here (it also has the problem solved in SQL):

1     |-----| 
2        |-----| 
3                 |--| 
4                       |-----| 
5                          |-----| 
6                                  |---| 
7                                        |---|  
8                           |---| 
9                                       |-----|

End result I need is something in the tune of (from the same example):

+----+---------------------+----+---------------------+ 
| id | start               | id | end                 | 
+----+---------------------+----+---------------------+ 
|  2 | 2008-09-01 15:02:00 |  1 | 2008-09-01 15:04:00 | 
|  5 | 2008-09-01 16:19:00 |  4 | 2008-09-01 16:23:00 | 
|  8 | 2008-09-01 16:20:00 |  4 | 2008-09-01 16:22:00 | 
|  8 | 2008-09-01 16:20:00 |  5 | 2008-09-01 16:22:00 | 
|  7 | 2008-09-01 18:18:00 |  9 | 2008-09-01 18:22:00 | 
+----+---------------------+----+---------------------+ 

SQL solution is described in the example as below but I couldn't do this in datastore because of lack of JOIN:

SELECT v1.id, v1.start, v2.id, LEAST(v1.end,v2.end) AS end 
FROM visits v1 
JOIN visits v2 ON v1.id <> v2.id and v1.start >= v2.start and v1.start < v2.end  
ORDER BY v1.start;

I understand that one-to-many version of this is rather straightforward using a ListProperty() (from this question).

Can anyone think of a solution to find the overlapping times (ideally in Python)?


回答1:


Look into Marzullo's algorithm its time efficiency is O(n log n).
There are also many question on Stackoverflow that cover overlapping intervals which can be used to solve your problem on AppEngine.




回答2:


Posting my solution with no JOINs, thanks to Shay's direction. Should be able to find overlaps over any number of datasets with minor edits (at least that's the theory).

My Python isn't that great but below should give the idea:

from operator import itemgetter

class Find_Overlaps(webapp2.RequestHandler):
    def get(self):
        all_dates = []
        first_dates = db.GqlQuery("SELECT * FROM First_Set")
        for date in first_dates:
            row = {'dataset':'First_Set', 'dbkey':date.key(), 'offset':date.start_time, 'type': -1}
            all_dates.append(row)
            row = {'dataset':'First_Set', 'dbkey':date.key(), 'offset':date.end_time, 'type': 1}
            all_dates.append(row)

        second_dates = db.GqlQuery("SELECT * FROM Second_Set")
        for date in second_dates:
            row = {'dataset':'Second_Set', 'dbkey':date.key(), 'offset':date.start_time, 'type': -1}
            all_dates.append(row)
            row = {'dataset':'Second_Set', 'dbkey':date.key(), 'offset':date.end_time, 'type': 1}
            all_dates.append(row)

        newlist = sorted(all_dates, key=itemgetter('offset','type'))
        number_datasets = 2 #goal is to find overlaps in all sets not only the best overlaps, that's why this is needed
        loopcnt = 0
        update_bestend = 0
        overlaps = []
        for row in newlist: #Below is mostly from Marzullo's alghorithm
            loopcnt = loopcnt - row['type']#this is to keep track of overall tally
            if update_bestend == 1:
                if loopcnt == (number_datasets - 1):
                    bestend = row['offset']
                    end_set = row['dataset']
                    end_key = row['dbkey']
                    overlaps.append({'start':beststart,'start_set':start_set,'start_key':start_key,'end':bestend,'end_set':end_set,'end_key':end_key})
                    update_bestend = 0
            if loopcnt == number_datasets:
                beststart = row['offset']
                start_set = row['dataset']
                start_key = row['dbkey']
                update_bestend = 1

        for overlap in overlaps: #just to see what the outcome is
            self.response.out.write('start: %s, start_set: %s, end: %s, end_set: %s<br>' % (overlap['start'], overlap['start_set'], overlap['end'], overlap['end_set']))


来源:https://stackoverflow.com/questions/11161238/compare-many-date-ranges-in-google-app-engine-datastore-many-to-many-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!