Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

后端未结

关注

 2  1043

感动是毒 2021-01-24 01:25

In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:

df_REPEATED[df_REPEATED[\'var\']>0]

This h

2条回答

灰色年华 (楼主)

2021-01-24 02:21

Not an official answer... but it already made my life simpler recently:

https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

You don't need to download the entire repo: saving the file and doing

from where import Where as W

should suffice. Then you use it like this:

df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])

A slightly less stupid usage example:

data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]

EDIT: this answer mentions an analogous approach not requiring external components, resulting in:

data = (pd.read_csv('ugly_db.csv')
          .loc[lambda df : ~(df == '$null$').any(axis=1)])

and another possibility is to use .apply(), as in

data = (pd.read_csv('ugly_db.csv')
          .pipe(lambda df : ~(df == '$null$').any(axis=1)))

0 讨论(0)

查看其它2个回答