问题
I have a Pandas Series with numerical data and I want to find its unique values together with their frequency-appearance. I use the standard procedure
# Given the my_data is a column of a pd.Dataframe df
unique = df[my_data].value_counts()
print unique
And here is the results that I get
# -------------------OUTPUT
-0.010000 46483
-0.010000 16895
-0.027497 12215
-0.294492 11915
0.027497 11397
What I don't get is why I have the "same value" (-0.01) occurring twice. Is that an internal threshold (small value) or is something that I am doing wrong??
Update
If I store the dataframe in csv and read it again I get the correct result, namely:
# -------------------OUTPUT
-0.010000 63378
-0.027497 12215
-0.294492 11915
0.027497 11397
Solution
Based on the discussion, I found the source of the problem and the solution. As mentioned it is a floating-point precision which can be solved with rounding the values. Though, I wouldn't be able to see that without
pd.set_option('display.float_format', repr)
Thanks a lot for the help!!
回答1:
I think it's a float precision issue similar to the following one:
In [1]: 0.1 + 0.2
Out[1]: 0.30000000000000004
In [2]: 0.1 + 0.2 == 0.3
Out[2]: False
so try this:
df[my_data].round(6).value_counts()
UPDATE:
Demo:
In [14]: s = pd.Series([-0.01, -0.01, -0.01000000000123, 0.2])
In [15]: s
Out[15]:
0 -0.01
1 -0.01
2 -0.01
3 0.20
dtype: float64
In [16]: s.value_counts()
Out[16]:
-0.01 2
-0.01 1
0.20 1
dtype: int64
In [17]: s.round(6).value_counts()
Out[17]:
-0.01 3
0.20 1
dtype: int64
来源:https://stackoverflow.com/questions/50105629/bizarre-behaviour-of-pandas-series-value-counts