问题
I am trying to create a 2D histrogram from a Pandas data frame "rates" The X and Y axis are supposed to be transforms from the dataframe, i.e., the X and Y axis are 'scaled' from the original frame columns and the bin heigths are according to the number of hits in each x/y bin.
import numpy, pylab, pandas
import matplotlib.pyplot as plt
list(rates.columns.values)
['sizes', 'transfers', 'positioning']
x=(rates["sizes"]/1024./1024.)
y=((rates["sizes"]/rates["transfers"])/1024.)+rates["positioning]
so, I try to feed them into a numpy 2D histogram with
histo, xedges, yedges = numpy.histogram2d(x, y, bins=(100,100))
However, this fails with
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
 hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py" line 363, in histogramdd
 decimal = int(-log10(mindiff)) + 6
ValueError: cannot convert float NaN to integer
I have already dropped all NaN in my rame 'rates.dropna()' - but actually from the error I guess, that it is not due to NaNs in my frame.
Maybe somebody has an idea, what goes wrong here?
回答1:
with help from @jme I got on the right track
I had not checked for a problematic value pair x:y = 0.0:inf can obviously not be a good 2D histogram vector, i.e., when transforming the original values I have to catch such cases.
another thing: numpy histogram had some issues for me with DataFrame series, so I had to get a proper numpy.arrary from the series to plot them properly, e.g.,
histo, xedges, yedges = np.histogram2d(np.array(x[1:MAX]),np.array(y[1:MAX]), bins=(100,100))
for slicing the series up to some variable MAX
来源:https://stackoverflow.com/questions/31008432/pandas-python-2d-histogram-fails-with-value-error