spark dropDuplicates based on json array field

后端 未结 2 1919
长发绾君心
长发绾君心 2021-01-26 05:07

I have json files of the following structure:

{\"names\":[{\"name\":\"John\",\"lastName\":\"Doe\"},
{\"name\":\"John\",\"lastName\":\"Marcus\"},
{\"name\":\"Davi         


        
2条回答
  •  耶瑟儿~
    2021-01-26 06:12

    just for future reference, the solution looks like

          val uniqueNams = allNames.withColumn("DEDUP_NAME_KEY", 
    org.apache.spark.sql.functions.explode(new Column("names.name")))
    .cache()
    .dropDuplicates(Array("DEDUP_NAME_KEY"))
    .drop("DEDUP_NAME_KEY")
    

提交回复
热议问题