发表新帖

发表新帖

pandas: filter rows of DataFrame with operator chaining

前端未结

关注

 14  2322

悲哀的现实 2020-11-22 16:46

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I

14条回答

余生分开走 (楼主)

2020-11-22 17:22
This solution is more hackish in terms of implementation, but I find it much cleaner in terms of usage, and it is certainly more general than the others proposed.

https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

You don't need to download the entire repo: saving the file and doing
```
from where import where as W
```
should suffice. Then you use it like this:
```
df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire - or subset of a - DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])
```
A slightly less stupid usage example:
```
data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]
```
By the way: even in the case in which you are just using boolean cols,
```
df.loc[W['cond1']].loc[W['cond2']]
```
can be much more efficient than
```
df.loc[W['cond1'] & W['cond2']]
```
because it evaluates cond2 only where cond1 is True.

DISCLAIMER: I first gave this answer elsewhere because I hadn't seen this.
0 讨论(0)

查看其它14个回答
发布评论:

提交评论
- 加载中...

热议问题