How to replace empty values in a column of DataFrame?

狂风中的少年 提交于 2019-12-29 09:17:14

问题


How can I replace empty values in a column Field1 of DataFrame df?

Field1 Field2
       AA
12     BB

This command does not provide an expected result:

df.na.fill("Field1",Seq("Anonymous"))

The expected result:

Field1          Field2
Anonymous       AA
12              BB

回答1:


Fill: Returns a new DataFrame that replaces null or NaN values in numeric columns with value.

Two things:

  1. An empty string is not null or NaN, so you'll have to use a case statement for that.
  2. Fill seems to not work well when giving a text value into a numeric column.

Failing Null Replace with Fill / Text:

scala> a.show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+

scala> a.na.fill("Anonymous", Seq("f1")).show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+

Working Example - Using Null With All Numbers:

scala> a.show
+----+---+
|  f1| f2|
+----+---+
|null| AA|
|  12| BB|
+----+---+


scala> a.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
|  1| AA|
| 12| BB|
+---+---+

Failing Example (Empty String instead of Null):

scala> b.show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+


scala> b.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+

Case Statement Fix Example:

scala> b.show
+---+---+
| f1| f2|
+---+---+
|   | AA|
| 12| BB|
+---+---+


scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
+---------+---+
|       f1| f2|
+---------+---+
|Anonymous| AA|
|       12| BB|
+---------+---+



回答2:


You can also try this. This might handle both blank/empty/null

df.show()
+------+------+
|Field1|Field2|
+------+------+
|      |    AA|
|    12|    BB|
|    12|  null|
+------+------+

df.na.replace(Seq("Field1","Field2"),Map(""-> null)).na.fill("Anonymous", Seq("Field2","Field1")).show(false)   

+---------+---------+
|Field1   |Field2   |
+---------+---------+
|Anonymous|AA       |
|12       |BB       |
|12       |Anonymous|
+---------+---------+   


来源:https://stackoverflow.com/questions/50260820/how-to-replace-empty-values-in-a-column-of-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!