Comparison operator in PySpark (not equal/ !=)

后端未结

关注

 2  1769

I am trying to obtain all rows in a dataframe where two flags are set to \'1\' and subsequently all those that where only one of two is set to \'1\' and the other NO

相关标签:

2条回答

栀梦

2020-12-31 05:01

To filter null values try:

foo_df = df.filter( (df.foo==1) & (df.bar.isNull()) )

https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.Column.isNull

0 讨论(0)

发布评论:

提交评论

加载中...

死守一世寂寞

2020-12-31 05:02

Why is it not filtering

Because it is SQL and NULL indicates missing values. Because of that any comparison to NULL, other than IS NULL and IS NOT NULL is undefined. You need either:

col("bar").isNull() | (col("bar") != 1)

or

coalesce(col("bar") != 1, lit(True))

or (PySpark >= 2.3):

col("bar").eqNullSafe(1)

if you want null safe comparisons in PySpark.

Also 'null' is not a valid way to introduce NULL literal. You should use None to indicate missing objects.

from pyspark.sql.functions import col, coalesce, lit df = spark.createDataFrame([ ('a', 1, 1), ('a',1, None), ('b', 1, 1), ('c' ,1, None), ('d', None, 1),('e', 1, 1) ]).toDF('id', 'foo', 'bar') df.where((col("foo") == 1) & (col("bar").isNull() | (col("bar") != 1))).show() ## +---+---+----+ ## | id|foo| bar| ## +---+---+----+ ## | a| 1|null| ## | c| 1|null| ## +---+---+----+ df.where((col("foo") == 1) & coalesce(col("bar") != 1, lit(True))).show() ## +---+---+----+ ## | id|foo| bar| ## +---+---+----+ ## | a| 1|null| ## | c| 1|null| ## +---+---+----+

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复