Using PySpark dataframes I\'m trying to do the following as efficiently as possible. I have a dataframe with a column which contains text and a list of words I want to filte
Well I have tried this and if you change the word list.
words_list = ['foo', 'is', 'bar']
The result remains the same and it doesn't show the other words.
+----+----+------------------+--------------+
|col1|col2| col_with_text|extracted_word|
+----+----+------------------+--------------+
| a| b| foo is tasty| foo|
| 12| 34| blah blahhh| |
| yeh| 0| bar of yums| bar|
|haha| 1| foobar none| |
|hehe| 2|something bar else| |
+----+----+------------------+--------------+