发表新帖

发表新帖

Is there function that can remove the outliers?

前端未结

关注

 4  964

生来不讨喜 2021-01-19 10:17

I find a function to detect outliers from columns but I do not know how to remove the outliers

is there a function for excluding or removing outliers from the colum

4条回答

佛祖请我去吃肉 (楼主)

2021-01-19 10:45
I presume that by "remove the outliers" you mean "remove rows from the df dataframe which contain an outlier in the 'Pre_TOTAL_PURCHASE_ADJ' column." If this is incorrect, perhaps you could revise the question to make your meaning clear.

Sample data are also helpful, rather than forcing would-be answerers to formulate their own.

It's generally much more efficient to avoid iterating over the rows of a dataframe. For row selections so-called Boolean array indexing is a fast way of achieving your ends. Since you already have a predicate (function returning a truth value) that will identify the rows you want to exclude, you can use such a predicate to build another dataframe that contains only the outliers, or (by negating the predicate) only the non-outliers.

Since @political_scientist has already given a practical solution using scipy.stats.zscore to produce the predicate values in a new is_outlier column I will leave this answer as simple, general advice for working in numpy and pandas. Given that answer, the rows you want would be given by
```
df[~df['is_outlier']]
```
though it might be slightly more comprehensible to include the negation (~) in the generation of the selector column rather than in the indexing as above, renaming the column 'is_not_outlier'.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题