Pandas histogram ignoring invalid data; limit x-range

问题

I have a dataframe which consists of a mix of text and numerical data, with some values of -999 representing missing or invalid data. As a toy example, let's say it looks like this:

import pandas as pd
import matplotlib.pyplot as plt

dictOne = {'Name':['First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh', 'Eighth', 'Ninth'],
           "A":[1, 2, -3, 4, 5, -999, 7, -999, 9],
           "B":[4, 5, 6, 5, 3, -999, 2, 9, 5],
           "C":[7, -999, 10, 5, 8, 6, 8, 2, 4]}
df2 = pd.DataFrame(dictOne)

df2.hist('C', bins = 1000)
plt.xlim=([0, 10])

This gives

I'm trying to exclude the -999 values. Is there an easy way in Pandas to do this?

Also, in my example code, why the x-axis not limited to the range [0,10]?

回答1:

df2[df2['C'] > -999].hist('C') will suffice for all of your purposes. Specifying 1000 bins is not necessary.

回答2:

Instead of bins=1000, you can specify

df2.hist('C', bins=range(0,10))

Or if you want to align the histogram boxes in the middle:

df2.hist('C', bins=np.arange(0.5,11,1))

Output:

来源：https://stackoverflow.com/questions/56013641/pandas-histogram-ignoring-invalid-data-limit-x-range

标签

python

pandas

histogram

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!