matplotlib: disregard outliers when plotting

后端未结

关注

 4  764

广开言路 2020-12-04 13:13

I\'m plotting some data from various tests. Sometimes in a test I happen to have one outlier (say 0.1), while all other values are three orders of magnitude smaller.

4条回答

刺人心 (楼主)

2020-12-04 13:54
I usually pass the data through the function np.clip, If you have some reasonable estimate of the maximum and minimum value of your data, just use that. If you don't have a reasonable estimate, the histogram of clipped data will show you the size of the tails, and if the outliers are really just outliers the tail should be small.

What I run is something like this:
```
import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(3, size=100000)
plt.hist(np.clip(data, -15, 8), bins=333, density=True)
```
You can compare the results if you change the min and max in the clipping function until you find the right values for your data.

In this example, you can see immediately that the max value of 8 is not good because you are removing a lot of meaningful information. The min value of -15 should be fine since the tail is not even visible.

You could probably write some code that based on this find some good bounds that minimize the sizes of the tails according to some tolerance.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...