问题
I have a pandas DataFrame (df) with many columns, two of which are "Year" and "col_1"
I also have a extraction criteria summarised in a list(Criteria):
[1234,5432,...,54353,654,1234].
I would like to extract the subset of this DataFrame if the following criteria are met:
((df.Year==1990) & (df.col_1>=Criteria[0])) or
((df.Year==1991) & (df.col_1>=Criteria[1])) or
((df.Year==1992) & (df.col_1>=Criteria[2])) or
...
((df.Year==2010) & (df.col_1>=Criteria[20])) or
((df.Year==2011) & (df.col_1>=Criteria[21]))
Although I can list out all the combination of these criteria, I would like to do this in one short line, something like:
df = df[df[['col_1','col_2']].apply(lambda x: f(*x), axis=1)]
(from how do you filter pandas dataframes by multiple columns)
Please advise how I can do it. Thank you.
回答1:
Sample DataFrame
:
df = pd.DataFrame({'col_1':[2000,1,54353,5],
'Year':[1990,1991,1992,1993],
'a':range(4)})
print (df)
col_1 Year a
0 2000 1990 0
1 1 1991 1
2 54353 1992 2
3 5 1993 3
Create helper dictionary
by criteria and years combinations:
Criteria = [1234,5432,54353,654,1234]
years = np.arange(1990, 1990 + len(Criteria))
d = dict(zip(years, Criteria))
print (d)
{1990: 1234, 1991: 5432, 1992: 54353, 1993: 654, 1994: 1234}
Last map by column year
and filter by boolean indexing:
df = df[df['col_1'] >= df['Year'].map(d)]
print (df)
col_1 Year a
0 2000 1990 0
2 54353 1992 2
Detail:
print (df['Year'].map(d))
0 1234
1 5432
2 54353
3 654
Name: Year, dtype: int64
print (df['col_1'] >= df['Year'].map(d))
0 True
1 False
2 True
3 False
dtype: bool
来源:https://stackoverflow.com/questions/51414814/pandas-filter-by-two-columns-python