I have json files of the following structure:
{\"names\":[{\"name\":\"John\",\"lastName\":\"Doe\"}, {\"name\":\"John\",\"lastName\":\"Marcus\"}, {\"name\":\"Davi
just for future reference, the solution looks like
val uniqueNams = allNames.withColumn("DEDUP_NAME_KEY", org.apache.spark.sql.functions.explode(new Column("names.name"))) .cache() .dropDuplicates(Array("DEDUP_NAME_KEY")) .drop("DEDUP_NAME_KEY")