Pandas Fillna Mode

后端未结

关注

 6  1358

借酒劲吻你 2021-02-05 12:13

I have a data set in which there is a column known as Native Country which contain around 30000 records. Some are missing represented by NaN so I thoug

6条回答

萌比男神i (楼主)

2021-02-05 12:44
If we fill in the missing values with fillna(df['colX'].mode()), since the result of mode() is a Series, it will only fill in the first couple of rows for the matching indices. At least if done as below:
```
fill_mode = lambda col: col.fillna(col.mode())
df.apply(fill_mode, axis=0)
```
However, by simply taking the first value of the Series fillna(df['colX'].mode()[0]), I think we risk introducing unintended bias in the data. If the sample is multimodal, taking just the first mode value makes the already biased imputation method worse. For example, taking only 0 if we have [0, 21, 99] as the equally most frequent values. Or filling missing values with False when True and False values are equally frequent in a given column.

I don't have a clear cut solution here. Assigning a random value from all the local maxima could be one approach if using the mode is a necessity.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...