Finding outliers in a data set

前端 未结 4 1459
说谎
说谎 2021-02-07 07:38

I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,

4条回答
  •  半阙折子戏
    2021-02-07 07:44

    I think your best bet is to have a look into the scipy's scoreatpercentile function. So for instance you could try excluding all the values that are above the 99th percentile.

    Mean and standard deviation are no good if you don't have a normal distribution.

    Generally it's good to have a rough visual idea of what your data looks like. There is matplotlib; I recommend you make some plots of your data with it before deciding on a plan.

提交回复
热议问题