Subset pandas data frame with datetime columns

问题

Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min, how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C, with the minimum base_date and the maximum date_2 date? [answer would be row 3]:

print(example)
   slot_id class        day   base_date      date_2
0        1     A     Monday  2019-01-21  2019-01-24
1        2     B    Tuesday  2019-01-22  2019-01-23
2        3     C  Wednesday  2019-01-22  2019-01-24
3        4     C  Wednesday  2019-01-22  2019-01-26
4        5     C  Wednesday  2019-01-24  2019-01-25
5        6     C   Thursday  2019-01-24  2019-01-22
6        7     D    Tuesday  2019-01-23  2019-01-24
7        8     E   Thursday  2019-01-24  2019-01-30
8        9     F   Saturday  2019-01-26  2019-01-31

For just class == "C" with the minimum base_date we can use:

df.iloc[pd.to_datetime(df.loc[df['class'] == 'C', 'base_date']).idxmin()]

However, if we had 2 or more date variables with conditions like max/min, would the index solution still be practical? Doesn't index subsetting with 2 or more variable imply nesting df.iloc? Is this the only way to do the subset with 2 or more datetime variables?

Data:

print(example.to_dict())
{'slot_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'class': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'C', 5: 'C', 6: 'D', 7: 'E', 8: 'F'}, 'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Wednesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Tuesday', 7: 'Thursday', 8: 'Saturday'}, 'base_date': {0: datetime.date(2019, 1, 21), 1: datetime.date(2019, 1, 22), 2: datetime.date(2019, 1, 22), 3: datetime.date(2019, 1, 22), 4: datetime.date(2019, 1, 24), 5: datetime.date(2019, 1, 24), 6: datetime.date(2019, 1, 23), 7: datetime.date(2019, 1, 24), 8: datetime.date(2019, 1, 26)}, 'date_2': {0: datetime.date(2019, 1, 24), 1: datetime.date(2019, 1, 23), 2: datetime.date(2019, 1, 24), 3: datetime.date(2019, 1, 26), 4: datetime.date(2019, 1, 25), 5: datetime.date(2019, 1, 22), 6: datetime.date(2019, 1, 24), 7: datetime.date(2019, 1, 30), 8: datetime.date(2019, 1, 31)}}

Data pretreatment:

example = pd.DataFrame(example)
example['base_date'] = pd.to_datetime(example['base_date'].astype(str), format='%d%m%Y')
example['base_date'] = example['base_date'].dt.date
example['date_2'] = pd.to_datetime(example['date_2'].astype(str), format='%d%m%Y')
example['date_2'] = example['date_2'].dt.date

回答1:

You can using transform

yourdf=example[example['base_date']==example.groupby('class')['base_date'].transform('min')]

If only for columns C

yourdf.loc[yourdf['class']=='C',:]

Also idxmin or idxmax will only return the first index met the min or max condition , so when there is multiple max or min values , they are still only show one index

来源：https://stackoverflow.com/questions/54451643/subset-pandas-data-frame-with-datetime-columns

标签

python

pandas

date

dataframe

subset