I have two array fields in a data frame.
I have a requirement to compare these two arrays and get the difference as an array(new column) in the same data fr
Since Spark 2.4.0, this can be solved easily using array_except. Taking the example
from pyspark.sql import functions as F
#example df
df=sqlContext.createDataFrame(pd.DataFrame(data=[[["hello", "world"],
["world"]],[["sample", "overflow", "text"], ["sample", "text"]]], columns=["A", "B"]))
df=df.withColumn('difference', F.array_except('A', 'B'))
for more similar operations on arrays, I suggest this blogpost https://www.waitingforcode.com/apache-spark-sql/apache-spark-2.4.0-features-array-higher-order-functions/read