I am trying to filter a pandas data frame using thresholds for three columns
import pandas as pd
df = pd.DataFrame({\"A\" : [6, 2, 10, -5, 3],
If you're trying to build a dynamic query, there are easier ways. Here's one using a list comprehension and str.join
:
query = ' & '.join(['{}>{}'.format(k, v) for k, v in limits_dic.items()])
Or, using f
-strings with python-3.6+,
query = ' & '.join([f'{k}>{v}' for k, v in limits_dic.items()])
print(query)
'A>0 & C>-1 & B>2'
Pass the query string to df.query
, it's meant for this very purpose:
out = df.query(query)
print(out)
A B C
1 2 5 2
2 10 3 1
4 3 6 2
From pandas 0.25, you can wrap your column name in backticks so this works:
query = ' & '.join([f'`{k}`>{v}' for k, v in limits_dic.items()])
See this Stack Overflow post for more.
You could also use df.eval
if you want to obtain a boolean mask for your query, and then indexing becomes straightforward after that:
mask = df.eval(query)
print(mask)
0 False
1 True
2 True
3 False
4 True
dtype: bool
out = df[mask]
print(out)
A B C
1 2 5 2
2 10 3 1
4 3 6 2
If you need to query columns that use string data, the code above will need a slight modification.
Consider (data from this answer):
df = pd.DataFrame({'gender':list('MMMFFF'),
'height':[4,5,4,5,5,4],
'age':[70,80,90,40,2,3]})
print (df)
gender height age
0 M 4 70
1 M 5 80
2 M 4 90
3 F 5 40
4 F 5 2
5 F 4 3
And a list of columns, operators, and values:
column = ['height', 'age', 'gender']
equal = ['>', '>', '==']
condition = [1.68, 20, 'F']
The appropriate modification here is:
query = ' & '.join(f'{i} {j} {repr(k)}' for i, j, k in zip(column, equal, condition))
df.query(query)
age gender height
3 40 F 5
For information on the pd.eval()
family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval().
An alternative to both posted, that may or may not be more pythonic:
import pandas as pd
import operator
from functools import reduce
df = pd.DataFrame({"A": [6, 2, 10, -5, 3],
"B": [2, 5, 3, 2, 6],
"C": [-5, 2, 1, 8, 2]})
limits_dic = {"A": 0, "B": 2, "C": -1}
# equiv to [df['A'] > 0, df['B'] > 2 ...]
loc_elements = [df[key] > val for key, val in limits_dic.items()]
df = df.loc[reduce(operator.and_, loc_elements)]
An alternative to @coldspeed 's version:
conditions = None
for key, val in limit_dic.items():
cond = df[key] > val
if conditions is None:
conditions = cond
else:
conditions = conditions & cond
print(df[conditions])
How I do this without creating a string and df.query
:
limits_dic = {"A" : 0, "B" : 2, "C" : -1}
cond = None
# Build the conjunction one clause at a time
for key, val in limits_dic.items():
if cond is None:
cond = df[key] > val
else:
cond = cond & (df[key] > val)
df.loc[cond]
A B C
0 2 5 2
1 10 3 1
2 3 6 2
Note the hard coded (>, &)
operators (since I wanted to follow your example exactly).