Finding unusual value in an array, list

心已入冬 提交于 2019-12-21 04:39:27

问题


I have sales statistic data in array form to calc standard deviation or average from this data.

stats = [100, 98, 102, 100, 108, 23, 120] 

let said +-20% differential is normal situation, 23 is obviously a special case.

what's the best algorithm (in any language, pseudo or any principle) to find this unusual value?


回答1:


You could convert them to Z-scores and look for outliers.

>>> import numpy as np
>>> stats = [100, 98, 102, 100, 108, 23, 120]
>>> mean = np.mean(stats)
>>> std = np.std(stats)
>>> stats_z = [(s - mean)/std for s in stats]
>>> np.abs(stats_z) > 2
array([False, False, False, False, False,  True, False], dtype=bool)



回答2:


Compute the average and standard deviation. Treat any value more than X standard deviations from the average as "unusual" (where X will probably be somewhere around 2.5 to 3.0 or so).

There are quite a few variations of this theme. If you need something that's really statistically sound, you might want to look into some of them -- they can eliminate things like defending the arbitrary choice of (say) 2.7 standard deviations as the dividing line.




回答3:


find the standard deviation, and values lying outside 3 sigma or +- 3 sigma is a outrageous value...

In theory, a +-3 sigma gives a confidence value of more than 99 %.



来源:https://stackoverflow.com/questions/10510169/finding-unusual-value-in-an-array-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!