Pyspark filter using startswith from list

有些话、适合烂在心里 提交于 2020-06-24 07:44:33

问题


I have a list of elements that may start a couple of strings that are of record in an RDD. If I have and element list of yes and no, they should match yes23 and no3 but not 35yes or 41no. Using pyspark, how can i use startswith any element in list or tuple.

An example DF would be:

+-----+------+
|index| label|
+-----+------+
|    1|yes342|
|    2| 45yes|
|    3| no123|
|    4|  75no|
+-----+------+

When I try:

Element_List = ['yes','no']
filter_DF = DF.where(DF.label.startswith(tuple(Element_List)))

The resulting df should look something like:

+-----+------+
|index| label|
+-----+------+
|    1|yes342|
|    3| no123|
+-----+------+

Instead I get The error:

Py4JError: An error occurred while calling o250.startsWith. Trace:
py4j.Py4JException: Method startsWith([class java.util.ArrayList]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

is prompted, so it looks like startsWith can't be used with any type of list. Is there a simple work around?


回答1:


Compose expression like this:

from pyspark.sql.functions import col, lit
from functools import reduce

element_list = ['yes','no']

df = spark.createDataFrame(
    ["yes23", "no3", "35yes", """41no["maybe"]"""],
    "string"
).toDF("location")

starts_with = reduce(
    lambda x, y: x | y,
    [col("location").startswith(s) for s in element_list], 
    lit(False))

df.where(starts_with).show()
# +--------+
# |location|
# +--------+
# |   yes23|
# |     no3|
# +--------+


来源:https://stackoverflow.com/questions/48549326/pyspark-filter-using-startswith-from-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!