Pyspark dataframe LIKE operator

无人久伴 提交于 2019-12-21 07:09:36

问题


What is the equivalent in Pyspark for LIKE operator? For example I would like to do:

SELECT * FROM table WHERE column LIKE "*somestring*";

looking for something easy like this (but this is not working):

df.select('column').where(col('column').like("*s*")).show()

回答1:


You can use where and col functions to do the same. where will be used for filtering of data based on a condition (here it is, if a column is like '%string%'). The col('col_name') is used to represent the condition and like is the operator:

df.where(col('col1').like("%string%")).show()



回答2:


Using spark 2.0.0 onwards following also works fine:

df.select('column').where("column like '%s%'").show()




回答3:


Use the like operator.

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions

df.filter(df.column.like('%s%')).show()



回答4:


Well...there should be sql like regexp ->

df.select('column').where(col('column').like("%s%")).show()



回答5:


In pyspark you can always register the dataframe as table and query it.

df.registerTempTable('my_table')
query = """SELECT * FROM my_table WHERE column LIKE '*somestring*'"""
sqlContext.sql(query).show()



回答6:


To replicate the case-insensitive ILIKE, you can use lower in conjunction with like.

from pyspark.sql.functions import lower

df.where(lower(col('col1')).like("%string%")).show()



回答7:


Using spark 2.4, to negate you can simply do:

df = df.filter("column not like '%bla%'")



回答8:


I always use a UDF to implement such functionality:

from pyspark.sql import functions as F 
like_f = F.udf(lambda col: True if 's' in col else False, BooleanType())
df.filter(like_f('column')).select('column')


来源:https://stackoverflow.com/questions/40220943/pyspark-dataframe-like-operator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!