Remove duplicates from PySpark array column
问题 I have a PySpark Dataframe that contains an ArrayType(StringType()) column. This column contains duplicate strings inside the array which I need to remove. For example, one row entry could look like [milk, bread, milk, toast] . Let's say my dataframe is named df and my column is named arraycol . I need something like: df = df.withColumn("arraycol_without_dupes", F.remove_dupes_from_array("arraycol")) My intution was that there exists a simple solution to this, but after browsing stackoverflow