问题
What is the equivalent in Pyspark for LIKE operator? For example I would like to do:
SELECT * FROM table WHERE column LIKE "*somestring*";
looking for something easy like this (but this is not working):
df.select('column').where(col('column').like("*s*")).show()
回答1:
You can use where
and col
functions to do the same. where
will be used for filtering of data based on a condition (here it is, if a column is like '%string%'
). The col('col_name')
is used to represent the condition and like
is the operator:
df.where(col('col1').like("%string%")).show()
回答2:
Using spark 2.0.0 onwards following also works fine:
df.select('column').where("column like '%s%'").show()
回答3:
Use the like operator.
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions
df.filter(df.column.like('%s%')).show()
回答4:
Well...there should be sql like regexp ->
df.select('column').where(col('column').like("%s%")).show()
回答5:
In pyspark you can always register the dataframe as table and query it.
df.registerTempTable('my_table')
query = """SELECT * FROM my_table WHERE column LIKE '*somestring*'"""
sqlContext.sql(query).show()
回答6:
To replicate the case-insensitive ILIKE
, you can use lower
in conjunction with like
.
from pyspark.sql.functions import lower
df.where(lower(col('col1')).like("%string%")).show()
回答7:
Using spark 2.4, to negate you can simply do:
df = df.filter("column not like '%bla%'")
回答8:
I always use a UDF to implement such functionality:
from pyspark.sql import functions as F
like_f = F.udf(lambda col: True if 's' in col else False, BooleanType())
df.filter(like_f('column')).select('column')
来源:https://stackoverflow.com/questions/40220943/pyspark-dataframe-like-operator