In DataFrame.withColumn, how can I check if the column's value is null as a condition for the second parameter?

六眼飞鱼酱① 提交于 2021-02-08 04:59:26

问题


If I have a DataFrame called df that looks like:

+----+----+
|  a1+  a2|
+----+----+
| foo| bar|
| N/A| baz|
|null| etc|
+----+----+

I can selectively replace values like so:

val df2 = df.withColumn("a1", when($"a1" === "N/A", $"a2"))

so that df2 looks like:

+----+----+
|  a1+  a2|
+----+----+
| foo| bar|
| baz| baz|
|null| etc|
+----+----+

but why can't I check if it's null, like:

val df3 = df2.withColumn("a1", when($"a1" === null, $"a2"))

so that I get:

+----+----+
|  a1+  a2|
+----+----+
| foo| bar|
| baz| baz|
| etc| etc|
+----+----+

Edit: $"a1".isNull doesn't seem to work. Could it be because of how I'm constructing the dataframe I'm using to test, which is as follows?

val schema = StructType(
                StructField("a1", StringType, false) ::
                StructField("a2", StringType, false) :: Nil
)

val data = sc.parallelize(Array(
                Row("foo","bar"),
                Row("N/A","baz"),
                Row(null,"etc"))
)

val df = sqlContext.createDataFrame(data, schema)

I also can't use coalesce, as far as I know, because sometimes I need to use a static value instead of another column's value.

Edit again: Setting my test columns to nullable = false doesn't help.


回答1:


Because null means no value and shouldn't be checked like this.

Use isNull function:

val df3 = df2.withColumn("a1", when($"a1".isNull, $"a2"))

or coalesce, which returns first non-null value:

val df3 = df2.withColumn("a1", coalesce($"a1", $"a2"))


来源:https://stackoverflow.com/questions/43853949/in-dataframe-withcolumn-how-can-i-check-if-the-columns-value-is-null-as-a-cond

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!