Pandas: Filter by values within multiple columns

為{幸葍}努か 提交于 2021-02-09 10:50:08

问题


I'm trying to filter a dataframe based on the values within the multiple columns, based on a single condition, but keep other columns to which I don't want to apply the filter at all.

I've reviewed these answers, with the third being the closest, but still no luck:

  • how do you filter pandas dataframes by multiple columns
  • Filtering multiple columns Pandas
  • Python Pandas - How to filter multiple columns by one value

Setup:

import pandas as pd

df = pd.DataFrame({
        'month':[1,1,1,2,2],
        'a':['A','A','A','A','NONE'],
        'b':['B','B','B','B','B'],
        'c':['C','C','C','NONE','NONE']
    }, columns = ['month','a','b','c'])

l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)

Current Output:

   month     a     c
0      2     A  NONE
1      2  NONE  NONE

Desired Output:

   month     a
0      2     A
1      2  NONE

I've tried:

sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]

and many other variations (.all(), [sub, :], ~df.loc[...], (axis = 0)), but all with no luck.

Basically I want to drop any column (within the sub list) that has all 'NONE' values in it.

Any help is much appreciated.


回答1:


You first want to substitute your 'NONE' with np.nan so that it is recognized as a null value by dropna. Then use loc with your boolean series and column subset. Then use dropna with axis=1 and how='all'

df.replace('NONE', np.nan) \
    .loc[df.month == df.month.max(), l].dropna(axis=1, how='all')

   month     a
3      2     A
4      2  NONE


来源:https://stackoverflow.com/questions/44416069/pandas-filter-by-values-within-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!