show distinct column values in pyspark dataframe: python
Please suggest pyspark dataframe alternative for Pandas df['col'].unique() . I want to list out all the unique values in a pyspark dataframe column. Not the SQL type way (registertemplate then SQL query for distinct values). Also I don't need groupby->countDistinct , instead I want to check distinct VALUES in that column. Let's assume we're working with the following representation of data (two columns, k and v , where k contains three entries, two unique: +---+---+ | k| v| +---+---+ |foo| 1| |bar| 2| |foo| 3| +---+---+ With a Pandas dataframe: import pandas as pd p_df = pd.DataFrame([("foo",