Pandas fillna() based on specific column attribute

问题

Let's say I have this table

Type | Killed | Survived
Dog      5         2
Dog      3         4
Cat      1         7
Dog     nan        3
cow     nan        2

One of the value on Killed is missing for [Type] = Dog.

I want to impute the mean in [Killed] for [Type] = Dog.

My code is as follow:

Search for the mean

df[df['Type'] == 'Dog'].mean().round()

This will give me the mean (around 2.25)

Impute the mean (This is where the problem begins)

df.loc[(df['Type'] == 'Dog') & (df['Killed'])].fillna(2.25, inplace = True)

The code runs, but the value is not impute, the NaN value is still there.

My Question is, how do I impute the mean in [Killed] based on [Type] = Dog.

回答1:

For me working:

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(2.25)
print (df)
  Type  Killed  Survived
0  Dog    5.00         2
1  Dog    3.00         4
2  Cat    1.00         7
3  Dog    2.25         3
4  cow     NaN         2

If need fillna by Series - because 2 columns Killed and Survived:

m = df[df['Type'] == 'Dog'].mean().round()
print (m)
Killed      4.0
Survived    3.0
dtype: float64

df.ix[df['Type'] == 'Dog'] = df.ix[df['Type'] == 'Dog'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         4
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

If need fillna only in column Killed:

#if dont need rounding, omit it
m = round(df.ix[df['Type'] == 'Dog', 'Killed'].mean())
print (m)
4

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

You can reuse code like:

filtered = df.ix[df['Type'] == 'Dog', 'Killed']
print (filtered)
0    5.0
1    3.0
3    NaN
Name: Killed, dtype: float64

df.ix[df['Type'] == 'Dog', 'Killed'] = filtered.fillna(filtered.mean())
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

回答2:

groupby with transform

df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))

Setup

df = pd.DataFrame([
        ['Dog', 5, 2],
        ['Dog', 3, 4],
        ['Cat', 1, 7],
        ['Dog', np.nan, 3],
        ['Cow', np.nan, 2]
    ], columns=['Type', 'Killed', 'Survived'])

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df

If you meant to consider the np.nan when calculating the mean

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.fillna(0).mean()))
df

回答3:

Two problems: Note that df.loc[(df['Type'] == 'Dog') & (df['Killed'])] isn't doing what (I presume) you think it is doing. Instead of selecting rows where type is dog and the column 'Killed', you are selecting rows of type dog, then doing elementwise "and" with the column 'Killed', which will give you garbage - False precisely where the columns 'Killed' is nan!

See:

In [6]: df.loc[(df['Type'] == 'Dog') & (df['Killed'])]
Out[6]: 
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         4

What you want is the following:

In [5]: df.loc[(df['Type'] == 'Dog'), ['Killed']]
Out[5]: 
   Killed
0     5.0
1     3.0
3     NaN

One more problem is that you need to use assignment in combination with .loc. and .fillna, so like the following:

In [6]: df.loc[(df['Type'] == 'Dog'), ['Killed']] = df.loc[(df['Type'] == 'Dog'), ['Killed']].fillna(2.25)

In [7]: df
Out[7]: 
  Type  Killed  Survived
0  Dog    5.00         2
1  Dog    3.00         4
2  Cat    1.00         7
3  Dog    2.25         3
4  cow     NaN         2

NOTE

The value you gave for your mean is wrong or does not correspond to the data you gave in the answer. The mean should be 4.

来源：https://stackoverflow.com/questions/39242615/pandas-fillna-based-on-specific-column-attribute

标签

python

pandas

indexing

nan

mean