问题
all,
I have a column in a dataframe that looks like this:
allHoldingsFund['BrokerMixed']
Out[419]:
78 ML
81 CITI
92 ML
173 CITI
235 ML
262 ML
264 ML
25617 GS
25621 CITI
25644 CITI
25723 GS
25778 CITI
25786 CITI
25793 GS
25797 CITI
Name: BrokerMixed, Length: 2554, dtype: object
Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:
allHoldingsFund['BrokerMixed'].unique()
I get an error
uniques = table.unique(values)
File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'
I also get an error when I do group by.
Any help is welcome. Thank you
回答1:
First I would suggest you to check what's type
of your column
. You may try as follows
print (type(allHoldingsFund['BrokerMixed']))
If this is a dataframe series
, you may try
allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()
and check if this works for you.
回答2:
Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique
, like set
, relies on hashing.
If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique
:
s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])
def tuplizer(x):
return tuple(x) if isinstance(x, (np.ndarray, list)) else x
res = s.apply(tuplizer).unique()
print(res)
array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)
Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.
回答3:
You have an array in your data column, you could try the following
allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()
来源:https://stackoverflow.com/questions/51675151/df-x-unique-and-typeerror-unhashable-type-numpy-ndarray