I\'m trying to count empty values in column in DataFrame like this:
df.filter((df(colname) === null) || (df(colname) === \"\")).count()
In
As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count() works for String data types but the testing shows that null are not handled.
@Psidom's answer handles both null and empty but does not handle for NaN.
checking for .isNaN should handle all three cases
df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()
You can use isNull to test the null condition:
val df = Seq((Some("a"), Some(1)), (null, null), (Some(""), Some(2))).toDF("A", "B")
// df: org.apache.spark.sql.DataFrame = [A: string, B: int]
df.filter(df("A").isNull || df("A") === "").count
// res7: Long = 2
df.filter(df("B").isNull || df("B") === "").count
// res8: Long = 1