df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

问题

all,

I have a column in a dataframe that looks like this:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

allHoldingsFund['BrokerMixed'].unique()

I get an error

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

Any help is welcome. Thank you

回答1:

First I would suggest you to check what's type of your column. You may try as follows

print (type(allHoldingsFund['BrokerMixed']))

If this is a dataframe series, you may try

allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()

and check if this works for you.

回答2:

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

回答3:

You have an array in your data column, you could try the following

allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()

来源：https://stackoverflow.com/questions/51675151/df-x-unique-and-typeerror-unhashable-type-numpy-ndarray

标签

python

pandas

group-by