问题

Having an issue with dropping all instances of a given series from the whole DF given a .dropna(thresh= x), that I thought had been Previously Resolved

Dataframe:

Note that it is Multi-indexed

          2001     2002     2003    2004

bob   A   123      31       4        12
bob   B   41        1       56       13
bob   C   nan      nan      4        nan

bill  A   451      8        nan      24
bill  B   32       5        52        6
bill  C   623      12       41       14

#Repeating features (A,B,C) for each index/name

This drops the one row/instance where the thresh= condition is met, but leaves the other instances of that feature.

drop the series from the entire df, if the thresh is met for any one row, such as:

df.dropna(thresh = 2, inplace=True):

           2001     2002     2003    2004

bob    A    123      31       4        12
bob    B    41        1       56       13

bill   A    451      8        nan      24
bill   B    32       5        52        6

#Drops C from the whole df

The solution I am using:

m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]

Does not seem to work for the entire DF

I believe I am just not applying it correctly... Any advice would be appreciated on how to fully implement this^ or the other solution:
a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1)
df_final = df.query('ilevel_1 not in @a')
Please note that in the actual DF, there will be more than one series that meet the nan threshold and will therefore need to be removed...

Further Explanation on Expected Result:

using

from collections import Counter

pd.DataFrame(Counter(df.series).keys(), Counter(df.series).values())

#Where series is the index_level_1 (A,B,C etc.)

I would expect an output of:

2   A
2   B
...

#Where the count of the series keys is the same for each series

来源：https://stackoverflow.com/questions/59621000/drop-series-from-entire-df-if-row-has-at-least-2-nan-values

标签

python

pandas

dataframe

nan

Drop Series from Entire DF if Row has at least 2 NaN values

问题

Dataframe:

drop the series from the entire df, if the `thresh` is met for any one row, such as:

The solution I am using:

Does not seem to work for the entire DF

Further Explanation on Expected Result:

Drop Series from Entire DF if Row has at least 2 NaN values

问题

Dataframe:

drop the series from the entire df, if the thresh is met for any one row, such as:

The solution I am using:

Does not seem to work for the entire DF

Further Explanation on Expected Result:

drop the series from the entire df, if the `thresh` is met for any one row, such as: