How to “select distinct” across multiple data frame columns in pandas?

前端 未结 6 493
渐次进展
渐次进展 2020-12-02 14:56

I\'m looking for a way to do the equivalent to the SQL

SELECT DISTINCT col1, col2 FROM dataframe_table

The pandas sql comparison doesn\'t

6条回答
  •  我在风中等你
    2020-12-02 15:49

    There is no unique method for a df, if the number of unique values for each column were the same then the following would work: df.apply(pd.Series.unique) but if not then you will get an error. Another approach would be to store the values in a dict which is keyed on the column name:

    In [111]:
    df = pd.DataFrame({'a':[0,1,2,2,4], 'b':[1,1,1,2,2]})
    d={}
    for col in df:
        d[col] = df[col].unique()
    d
    
    Out[111]:
    {'a': array([0, 1, 2, 4], dtype=int64), 'b': array([1, 2], dtype=int64)}
    

提交回复
热议问题