Removing list of words from a string

前端 未结 5 2019
一个人的身影
一个人的身影 2020-11-30 06:54

I have a list of stopwords. And I have a search string. I want to remove the words from the string.

As an example:

stopwords=[\'what\',\'who\',\'         


        
5条回答
  •  情话喂你
    2020-11-30 07:06

    the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.

    Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)

    My proposal:

    import re
    
    query = 'What is hello? Says Who?'
    stopwords = {'what','who','is','a','at','is','he'}
    
    resultwords  = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
    print(resultwords)
    

    output (as list of words):

    ['hello','Says']
    

提交回复
热议问题