What\'s the difference between selecting with a where clause and filtering in Spark?
Are there any use cases in which one is more appropriate than the other one?
According to spark documentation "where()
is an alias for filter()
"
filter(condition)
Filters rows using the given condition.
where()
is an alias for filter()
.
Parameters: condition – a Column
of types.BooleanType
or a string of SQL expression.
>>> df.filter(df.age > 3).collect()
[Row(age=5, name=u'Bob')]
>>> df.where(df.age == 2).collect()
[Row(age=2, name=u'Alice')]
>>> df.filter("age > 3").collect()
[Row(age=5, name=u'Bob')]
>>> df.where("age = 2").collect()
[Row(age=2, name=u'Alice')]