Group together arbitrary date objects that are within a time range of each other

前端 未结 3 1583
执笔经年
执笔经年 2020-12-15 14:31

I want to split the calendar into two-week intervals starting at 2008-May-5, or any arbitrary starting point.

So I start with several date objects:

相关标签:
3条回答
  • 2020-12-15 14:43
    import datetime as DT
    import itertools
    
    start_date=DT.date(2008,5,5)
    
    def mkdate(datestring):
        return DT.datetime.strptime(datestring, "%Y-%m-%d").date()
    
    def fortnight(date):
        return (date-start_date).days //14
    
    raw = ("2010-08-01",
           "2010-06-25",
           "2010-07-01",
           "2010-07-08")
    transactions=[(date,"Some data") for date in map(mkdate,raw)]
    transactions.sort(key=lambda (date,data):date)
    
    for key,grp in itertools.groupby(transactions,key=lambda (date,data):fortnight(date)):
        print(key,list(grp))
    

    yields

    # (55, [(datetime.date(2010, 6, 25), 'Some data')])
    # (56, [(datetime.date(2010, 7, 1), 'Some data'), (datetime.date(2010, 7, 8), 'Some data')])
    # (58, [(datetime.date(2010, 8, 1), 'Some data')])
    

    Note that 2010-6-25 is in the 55th fortnight from 2008-5-5, while 2010-7-1 is in the 56th. If you want them grouped together, simply change start_date (to something like 2008-5-16).

    PS. The key tool used above is itertools.groupby, which is explained in detail here.

    Edit: The lambdas are simply a way to make "anonymous" functions. (They are anonymous in the sense that they are not given names like functions defined by def). Anywhere you see a lambda, it is also possible to use a def to create an equivalent function. For example, you could do this:

    import operator
    transactions.sort(key=operator.itemgetter(0))
    
    def transaction_fortnight(transaction):
        date,data=transaction
        return fortnight(date)
    
    for key,grp in itertools.groupby(transactions,key=transaction_fortnight):
        print(key,list(grp))
    
    0 讨论(0)
  • 2020-12-15 14:57

    Use itertools groupby with lambda function to divide by the length of period the distance from starting point.

    >>> for i, group in groupby(range(30), lambda x: x // 7):
        print list(group)
    
    
    [0, 1, 2, 3, 4, 5, 6]
    [7, 8, 9, 10, 11, 12, 13]
    [14, 15, 16, 17, 18, 19, 20]
    [21, 22, 23, 24, 25, 26, 27]
    [28, 29]
    

    So with dates:

    import itertools as it
    start = DT.date(2008,5,5)
    lenperiod = 14
    
    for fnight,info in it.groupby(transactions,lambda data: (data[0]-start).days // lenperiod):
        print list(info)
    

    You can use also weeknumbers from strftime, and lenperiod in number of weeks:

    for fnight,info in it.groupby(transactions,lambda data: int (data[0].strftime('%W')) // lenperiod):
        print list(info)
    
    0 讨论(0)
  • 2020-12-15 15:01

    Using a pandas DataFrame with resample works too. Given OP's data, but change "some data here" to 'abcd'.

    >>> import datetime as DT
    >>> raw = ("2010-08-01",
    ...        "2010-06-25",
    ...        "2010-07-01",
    ...        "2010-07-08")
    >>> transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d"), data) for
    ...                 datestring, data in zip(raw,'abcd')]
    [(datetime.datetime(2010, 8, 1, 0, 0), 'a'),
     (datetime.datetime(2010, 6, 25, 0, 0), 'b'),
     (datetime.datetime(2010, 7, 1, 0, 0), 'c'),
     (datetime.datetime(2010, 7, 8, 0, 0), 'd')]
    

    Now try using pandas. First create a DataFrame, naming the columns and setting the indices to the dates.

    >>> import pandas as pd
    >>> df = pd.DataFrame(transactions,
    ...                   columns=['date','data']).set_index('date')
               data
    date
    2010-08-01    a
    2010-06-25    b
    2010-07-01    c
    2010-07-08    d
    

    Now use the Series Offset Aliases to every 2 weeks starting on Sundays and concatenate the results.

    >>> fortnight = df.resample('2W-SUN').sum()
               data
    date
    2010-06-27    b
    2010-07-11   cd
    2010-07-25    0
    2010-08-08    a
    

    Now drill into the data as needed by weekstart

    >>> fortnight.loc['2010-06-27']['data']
    b
    

    or index

    >>> fortnight.iloc[0]['data']
    b
    

    or indices

    >>> data = fortnight.iloc[:2]['data']
    b
    date
    2010-06-27     b
    2010-07-11    cd
    Freq: 2W-SUN, Name: data, dtype: object
    >>> data[0]
    b
    >>> data[1]
    cd
    
    0 讨论(0)
提交回复
热议问题