Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

后端 未结 2 1043
感动是毒
感动是毒 2021-01-24 01:25

In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:

df_REPEATED[df_REPEATED[\'var\']>0]

This h

2条回答
  •  灰色年华
    2021-01-24 02:21

    Not an official answer... but it already made my life simpler recently:

    https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

    You don't need to download the entire repo: saving the file and doing

    from where import Where as W
    

    should suffice. Then you use it like this:

    df = pd.DataFrame([[1, 2, True],
                       [3, 4, False], 
                       [5, 7, True]],
                      index=range(3), columns=['a', 'b', 'c'])
    # On specific column:
    print(df.loc[W['a'] > 2])
    print(df.loc[-W['a'] == W['b']])
    print(df.loc[~W['c']])
    # On entire DataFrame:
    print(df.loc[W.sum(axis=1) > 3])
    print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])
    

    A slightly less stupid usage example:

    data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]
    

    EDIT: this answer mentions an analogous approach not requiring external components, resulting in:

    data = (pd.read_csv('ugly_db.csv')
              .loc[lambda df : ~(df == '$null$').any(axis=1)])
    

    and another possibility is to use .apply(), as in

    data = (pd.read_csv('ugly_db.csv')
              .pipe(lambda df : ~(df == '$null$').any(axis=1)))
    

提交回复
热议问题