问题
Having an issue with dropping all instances of a given series from the whole DF given a .dropna(thresh= x)
, that I thought had been Previously Resolved
Dataframe:
Note that it is Multi-indexed
2001 2002 2003 2004
bob A 123 31 4 12
bob B 41 1 56 13
bob C nan nan 4 nan
bill A 451 8 nan 24
bill B 32 5 52 6
bill C 623 12 41 14
#Repeating features (A,B,C) for each index/name
This drops the one row/instance where the thresh=
condition is met, but leaves the other instances of that feature.
drop the series from the entire df, if the
thresh
is met for any one row, such as:
df.dropna(thresh = 2, inplace=True):
2001 2002 2003 2004
bob A 123 31 4 12
bob B 41 1 56 13
bill A 451 8 nan 24
bill B 32 5 52 6
#Drops C from the whole df
The solution I am using:
m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]
Does not seem to work for the entire DF
I believe I am just not applying it correctly... Any advice would be appreciated on how to fully implement this^ or the other solution:
a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1) df_final = df.query('ilevel_1 not in @a')
Please note that in the actual DF, there will be more than one series that meet the nan threshold and will therefore need to be removed...
Further Explanation on Expected Result:
using
from collections import Counter
pd.DataFrame(Counter(df.series).keys(), Counter(df.series).values())
#Where series is the index_level_1 (A,B,C etc.)
I would expect an output of:
2 A
2 B
...
#Where the count of the series keys is the same for each series
来源:https://stackoverflow.com/questions/59621000/drop-series-from-entire-df-if-row-has-at-least-2-nan-values