问题
How can I replace empty values in a column Field1
of DataFrame df
?
Field1 Field2
AA
12 BB
This command does not provide an expected result:
df.na.fill("Field1",Seq("Anonymous"))
The expected result:
Field1 Field2
Anonymous AA
12 BB
回答1:
Fill: Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
Two things:
- An empty string is not null or NaN, so you'll have to use a case statement for that.
- Fill seems to not work well when giving a text value into a numeric column.
Failing Null Replace with Fill / Text:
scala> a.show
+----+---+
| f1| f2|
+----+---+
|null| AA|
| 12| BB|
+----+---+
scala> a.na.fill("Anonymous", Seq("f1")).show
+----+---+
| f1| f2|
+----+---+
|null| AA|
| 12| BB|
+----+---+
Working Example - Using Null With All Numbers:
scala> a.show
+----+---+
| f1| f2|
+----+---+
|null| AA|
| 12| BB|
+----+---+
scala> a.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
| 1| AA|
| 12| BB|
+---+---+
Failing Example (Empty String instead of Null):
scala> b.show
+---+---+
| f1| f2|
+---+---+
| | AA|
| 12| BB|
+---+---+
scala> b.na.fill(1, Seq("f1")).show
+---+---+
| f1| f2|
+---+---+
| | AA|
| 12| BB|
+---+---+
Case Statement Fix Example:
scala> b.show
+---+---+
| f1| f2|
+---+---+
| | AA|
| 12| BB|
+---+---+
scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
+---------+---+
| f1| f2|
+---------+---+
|Anonymous| AA|
| 12| BB|
+---------+---+
回答2:
You can also try this. This might handle both blank/empty/null
df.show()
+------+------+
|Field1|Field2|
+------+------+
| | AA|
| 12| BB|
| 12| null|
+------+------+
df.na.replace(Seq("Field1","Field2"),Map(""-> null)).na.fill("Anonymous", Seq("Field2","Field1")).show(false)
+---------+---------+
|Field1 |Field2 |
+---------+---------+
|Anonymous|AA |
|12 |BB |
|12 |Anonymous|
+---------+---------+
来源:https://stackoverflow.com/questions/50260820/how-to-replace-empty-values-in-a-column-of-dataframe