python pandas dataframe slicing by date conditions

前端 未结 4 846
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-29 18:27

I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing dates in index. For example, this works:

相关标签:
4条回答
  • 2020-11-29 18:58

    Use searchsorted to find the nearest times first, and then use it to slice.

    In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])
    
    In [16]: df
    Out[16]: 
                0
    2013-01-01  1
    2013-01-03  2
    2013-01-05  3
    
    In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))
    
    In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))
    
    In [24]: df.iloc[start:end]
    Out[24]: 
                0
    2013-01-03  2
    
    0 讨论(0)
  • 2020-11-29 18:58

    You can use a simple mask to accomplish this:

    date_mask = (data.index > start) & (data.index < end)
    dates = data.index[date_mask]
    data.ix[dates]
    

    By the way, this works for hierarchical indexing as well. In that case data.index would be replaced with data.index.levels[0] or similar.

    0 讨论(0)
  • 2020-11-29 19:07

    I had difficulty with other approaches but I found that the following approach worked for me:

    # Set the Index to be the Date
    df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
    df.set_index('Date', inplace=True)
    
    # Sort the Data
    df = df.sort_values('Date_1')
    
    # Slice the Data
    From = '2017-05-07'
    To   = '2017-06-07'
    df_Z = df.loc[From:To,:]
    
    0 讨论(0)
  • 2020-11-29 19:08

    Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

    Yes, you can slice using datetimes not present in the DataFrame. For example:

    In [12]: df
    Out[12]: 
                       0
    2013-04-20  1.120024
    2013-04-21 -0.721101
    2013-04-22  0.379392
    2013-04-23  0.924535
    2013-04-24  0.531902
    2013-04-25 -0.957936
    
    In [13]: df['20130419':'20130422']
    Out[13]: 
                       0
    2013-04-20  1.120024
    2013-04-21 -0.721101
    2013-04-22  0.379392
    

    As you can see, you don't even have to build datetime objects; strings work.

    Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here...

    In [17]: df
    Out[17]: 
                       0
    2013-04-22  1.120024
    2013-04-20 -0.721101
    2013-04-24  0.379392
    2013-04-23  0.924535
    2013-04-21  0.531902
    2013-04-25 -0.957936
    

    ...and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

    In [18]: df['20130419':'20130422']
    Out[18]: 
                       0
    2013-04-22  1.120024
    2013-04-20 -0.721101
    2013-04-24  0.379392
    

    This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

    df.sort_index()
    
    0 讨论(0)
提交回复
热议问题