How to efficiently check if a list of words is contained in a Spark Dataframe?

后端未结

关注

 2  1092

春和景丽 2020-12-17 06:07

Using PySpark dataframes I\'m trying to do the following as efficiently as possible. I have a dataframe with a column which contains text and a list of words I want to filte

2条回答

春和景丽 (楼主)

2020-12-17 06:42

Well I have tried this and if you change the word list.

words_list = ['foo', 'is', 'bar']

The result remains the same and it doesn't show the other words.

+----+----+------------------+--------------+ |col1|col2| col_with_text|extracted_word| +----+----+------------------+--------------+ | a| b| foo is tasty| foo| | 12| 34| blah blahhh| | | yeh| 0| bar of yums| bar| |haha| 1| foobar none| | |hehe| 2|something bar else| | +----+----+------------------+--------------+

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...