How to “select distinct” across multiple data frame columns in pandas?

前端 未结 6 494
渐次进展
渐次进展 2020-12-02 14:56

I\'m looking for a way to do the equivalent to the SQL

SELECT DISTINCT col1, col2 FROM dataframe_table

The pandas sql comparison doesn\'t

6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-02 15:40

    To solve a similar problem, I'm using groupby:

    print(f"Distinct entries: {len(df.groupby(['col1', 'col2']))}")
    

    Whether that's appropriate will depend on what you want to do with the result, though (in my case, I just wanted the equivalent of COUNT DISTINCT as shown).

提交回复
热议问题