Coalesce duplicate columns in spark dataframe

假装没事ソ 提交于 2019-12-22 00:25:33

问题


I have a spark data frame which can have duplicate columns, with different row values, is it possible to coalesce those duplicate columns and get a dataframe without any duplicate columns

example :

|name |upload| name| upload1|

| null|  null|alice|    101|  
| null|  null|  bob|    231|   
|alice|   100| null|   null|   
|  bob|    23| null|   null|

should become -

|name |upload| upload1|

| alice|  null|  101|  
| bob |  null|   231|   
|alice|   100|  null|   
|  bob|    23|  null|

回答1:


val DF1 = Seq(
  (None,          None,      Some("alice"), Some(101)), 
  (None,          None,      Some("bob"),   Some(231)),  
  (Some("alice"), Some(100), None,          None),  
  (Some("bob"),   Some(23),  None,          None)).
    toDF("name","upload", "name1", "upload1")

DF1.withColumn("name", coalesce($"name", $"name1")).drop("name1").show

+-----+------+-------+
| name|upload|upload1|
+-----+------+-------+
|alice|  null|    101|
|  bob|  null|    231|
|alice|   100|   null|
|  bob|    23|   null|
+-----+------+-------+


来源:https://stackoverflow.com/questions/48109064/coalesce-duplicate-columns-in-spark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!