show distinct column values in pyspark dataframe: python
问题 Please suggest pyspark dataframe alternative for Pandas df['col'].unique() . I want to list out all the unique values in a pyspark dataframe column. Not the SQL type way (registertemplate then SQL query for distinct values). Also I don't need groupby->countDistinct , instead I want to check distinct VALUES in that column. 回答1: Let's assume we're working with the following representation of data (two columns, k and v , where k contains three entries, two unique: +---+---+ | k| v| +---+---+