问题
Let's say I have this table
Type | Killed | Survived
Dog 5 2
Dog 3 4
Cat 1 7
Dog nan 3
cow nan 2
One of the value on Killed
is missing for [Type] = Dog
.
I want to impute the mean in [Killed]
for [Type] = Dog
.
My code is as follow:
- Search for the mean
df[df['Type'] == 'Dog'].mean().round()
This will give me the mean (around 2.25)
- Impute the mean (This is where the problem begins)
df.loc[(df['Type'] == 'Dog') & (df['Killed'])].fillna(2.25, inplace = True)
The code runs, but the value is not impute, the NaN value is still there.
My Question is, how do I impute the mean in [Killed]
based on [Type] = Dog
.
回答1:
For me working:
df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(2.25)
print (df)
Type Killed Survived
0 Dog 5.00 2
1 Dog 3.00 4
2 Cat 1.00 7
3 Dog 2.25 3
4 cow NaN 2
If need fillna by Series
- because 2 columns Killed
and Survived
:
m = df[df['Type'] == 'Dog'].mean().round()
print (m)
Killed 4.0
Survived 3.0
dtype: float64
df.ix[df['Type'] == 'Dog'] = df.ix[df['Type'] == 'Dog'].fillna(m)
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 4
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
If need fillna only in column Killed
:
#if dont need rounding, omit it
m = round(df.ix[df['Type'] == 'Dog', 'Killed'].mean())
print (m)
4
df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(m)
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 8
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
You can reuse code like:
filtered = df.ix[df['Type'] == 'Dog', 'Killed']
print (filtered)
0 5.0
1 3.0
3 NaN
Name: Killed, dtype: float64
df.ix[df['Type'] == 'Dog', 'Killed'] = filtered.fillna(filtered.mean())
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 8
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
回答2:
groupby
with transform
df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
Setup
df = pd.DataFrame([
['Dog', 5, 2],
['Dog', 3, 4],
['Cat', 1, 7],
['Dog', np.nan, 3],
['Cow', np.nan, 2]
], columns=['Type', 'Killed', 'Survived'])
df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df
If you meant to consider the np.nan
when calculating the mean
df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.fillna(0).mean()))
df
回答3:
Two problems: Note that df.loc[(df['Type'] == 'Dog') & (df['Killed'])]
isn't doing what (I presume) you think it is doing. Instead of selecting rows where type is dog and the column 'Killed', you are selecting rows of type dog, then doing elementwise "and" with the column 'Killed', which will give you garbage - False
precisely where the columns 'Killed' is nan
!
See:
In [6]: df.loc[(df['Type'] == 'Dog') & (df['Killed'])]
Out[6]:
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 4
What you want is the following:
In [5]: df.loc[(df['Type'] == 'Dog'), ['Killed']]
Out[5]:
Killed
0 5.0
1 3.0
3 NaN
One more problem is that you need to use assignment in combination with .loc
. and .fillna
, so like the following:
In [6]: df.loc[(df['Type'] == 'Dog'), ['Killed']] = df.loc[(df['Type'] == 'Dog'), ['Killed']].fillna(2.25)
In [7]: df
Out[7]:
Type Killed Survived
0 Dog 5.00 2
1 Dog 3.00 4
2 Cat 1.00 7
3 Dog 2.25 3
4 cow NaN 2
NOTE
The value you gave for your mean is wrong or does not correspond to the data you gave in the answer. The mean should be 4.来源:https://stackoverflow.com/questions/39242615/pandas-fillna-based-on-specific-column-attribute