Standard deviation of a list

房东的猫 提交于 2019-11-28 15:33:40
Bengt

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

Here's some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I've used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1
Ome

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

  • Population std: Just use numpy.std() with no additional arguments besides to your data list.
  • Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.

B.Kocis

In python 2.7 you can use NumPy's numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 - using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

  • variance calculates variance of sample population
  • pvariance calculates variance of entire population
  • similar differences between stdev and pstdev

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you've described.

I'm going to assume A-Z is the entire population. If not see Ome's answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!