Numpy Histogram Representing Floats with Approximate Values as The Same

佐手、 提交于 2021-02-09 20:41:41

问题


I have code that generates a certain value from -10 to 10 given a range from [0,1) The code takes the value from -10 to 10 and it will append it to a list, according to its probability. For example, -10 would be put in the list 0 times since it corresponds to the value 0, and 10 would be put 100 times (as a normalization) since it corresponds to 1 in the range.

Here is the code:

#!/usr/bin/env python

import math
import numpy as np
import matplotlib.pyplot as plt

pos = []
ceilingValue = 0.82
pValues = np.linspace(0.00, ceilingValue, num=100*ceilingValue)

for i in xrange(int(100*ceilingValue)):
    p = pValues[i]
    y = -11.63*math.log(-2.36279*(p - 1))
    for j in xrange(i):
        pos.append(y)

avg = np.average(pos)    
std = np.std(pos)    

hist, bins = np.histogram(pos,bins = 100)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.show()  

The problem is that the histogram will generate an accurate plot, but certain values will break the trend. For example, -5.88 which corresponds to about 30 entries in the frequency count will be at about 70. I think python sees the two values and simply lumps them together but I'm not sure how to fix it. But if it's just the histogram that's doing something wrong, then it doesn't matter, I don't really need it. I just need the list, if it is right in the first place.


回答1:


I think the underlying issue is that your bin size is uniform, whereas the differences between the unique values in pos scale exponentially. Because of that you'll always end up either with weird 'spikes' where two nearby unique values fall within the same bin, or lots of empty bins (especially if you just increase the bin count to get rid of the 'spikes').

You could try setting your bins according to the actual unique values in pos, so that their widths are non-uniform:

 # the " + [10,]" forces the rightmost bin edge to == 10
 uvals = np.unique(pos+[10,])
 hist, bins = np.histogram(pos,bins=uvals)
 plt.bar(bins[:-1],hist,width=np.diff(bins))

enter image description here




回答2:


I believe you're fine. I reran your code using bins = 200 instead of bins = 100 and the spikes disappeared. I think you had values that got caught on the boundaries between bins.



来源:https://stackoverflow.com/questions/17753501/numpy-histogram-representing-floats-with-approximate-values-as-the-same

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!