问题
Following up this question where a pandas data frame is subset by one string variable and one datetime variable using idx.min, how could we subset by two date time variables? For the example data frame below, how would we subset rows from class == C, with the minimum base_date and the maximum date_2 date? [answer would be row 3]:
print(example)
slot_id class day base_date date_2
0 1 A Monday 2019-01-21 2019-01-24
1 2 B Tuesday 2019-01-22 2019-01-23
2 3 C Wednesday 2019-01-22 2019-01-24
3 4 C Wednesday 2019-01-22 2019-01-26
4 5 C Wednesday 2019-01-24 2019-01-25
5 6 C Thursday 2019-01-24 2019-01-22
6 7 D Tuesday 2019-01-23 2019-01-24
7 8 E Thursday 2019-01-24 2019-01-30
8 9 F Saturday 2019-01-26 2019-01-31
For just class == "C" with the minimum base_date we can use:
df.iloc[pd.to_datetime(df.loc[df['class'] == 'C', 'base_date']).idxmin()]
However, if we had 2 or more date variables with conditions like max/min, would the index solution still be practical? Doesn't index subsetting with 2 or more variable imply nesting df.iloc? Is this the only way to do the subset with 2 or more datetime variables?
Data:
print(example.to_dict())
{'slot_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'class': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'C', 5: 'C', 6: 'D', 7: 'E', 8: 'F'}, 'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Wednesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Tuesday', 7: 'Thursday', 8: 'Saturday'}, 'base_date': {0: datetime.date(2019, 1, 21), 1: datetime.date(2019, 1, 22), 2: datetime.date(2019, 1, 22), 3: datetime.date(2019, 1, 22), 4: datetime.date(2019, 1, 24), 5: datetime.date(2019, 1, 24), 6: datetime.date(2019, 1, 23), 7: datetime.date(2019, 1, 24), 8: datetime.date(2019, 1, 26)}, 'date_2': {0: datetime.date(2019, 1, 24), 1: datetime.date(2019, 1, 23), 2: datetime.date(2019, 1, 24), 3: datetime.date(2019, 1, 26), 4: datetime.date(2019, 1, 25), 5: datetime.date(2019, 1, 22), 6: datetime.date(2019, 1, 24), 7: datetime.date(2019, 1, 30), 8: datetime.date(2019, 1, 31)}}
Data pretreatment:
example = pd.DataFrame(example)
example['base_date'] = pd.to_datetime(example['base_date'].astype(str), format='%d%m%Y')
example['base_date'] = example['base_date'].dt.date
example['date_2'] = pd.to_datetime(example['date_2'].astype(str), format='%d%m%Y')
example['date_2'] = example['date_2'].dt.date
回答1:
You can using transform
yourdf=example[example['base_date']==example.groupby('class')['base_date'].transform('min')]
If only for columns C
yourdf.loc[yourdf['class']=='C',:]
Also idxmin or idxmax will only return the first index met the min or max condition , so when there is multiple max or min values , they are still only show one index
来源:https://stackoverflow.com/questions/54451643/subset-pandas-data-frame-with-datetime-columns