Removing list of words from a string

前端 未结 5 2015
一个人的身影
一个人的身影 2020-11-30 06:54

I have a list of stopwords. And I have a search string. I want to remove the words from the string.

As an example:

stopwords=[\'what\',\'who\',\'         


        
相关标签:
5条回答
  • 2020-11-30 06:55

    Looking at the other answers to your question I noticed that they told you how to do what you are trying to do, but they did not answer the question you posed at the end.

    If the input query is "What is Hello", I get the output as:

    wht s llo

    Why does this happen?

    This happens because .replace() replaces the substring you give it exactly.

    for example:

    "My, my! Hello my friendly mystery".replace("my", "")
    

    gives:

    >>> "My, ! Hello  friendly stery"
    

    .replace() is essentially splitting the string by the substring given as the first parameter and joining it back together with the second parameter.

    "hello".replace("he", "je")
    

    is logically similar to:

    "je".join("hello".split("he"))
    

    If you were still wanting to use .replace to remove whole words you might think adding a space before and after would be enough, but this leaves out words at the beginning and end of the string as well as punctuated versions of the substring.

    "My, my! hello my friendly mystery".replace(" my ", " ")
    >>> "My, my! hello friendly mystery"
    
    "My, my! hello my friendly mystery".replace(" my", "")
    >>> "My,! hello friendlystery"
    
    "My, my! hello my friendly mystery".replace("my ", "")
    >>> "My, my! hello friendly mystery"
    

    Additionally, adding spaces before and after will not catch duplicates as it has already processed the first sub-string and will ignore it in favor of continuing on:

    "hello my my friend".replace(" my ", " ")
    >>> "hello my friend"
    

    For these reasons your accepted answer by Robby Cornelissen is the recommended way to do what you are wanting.

    0 讨论(0)
  • 2020-11-30 06:56

    building on what karthikr said, try

    ' '.join(filter(lambda x: x.lower() not in stopwords,  query.split()))
    

    explanation:

    query.split() #splits variable query on character ' ', e.i. "What is hello" -> ["What","is","hello"]
    
    filter(func,iterable) #takes in a function and an iterable (list/string/etc..) and
                          # filters it based on the function which will take in one item at
                          # a time and return true.false
    
    lambda x: x.lower() not in stopwords   # anonymous function that takes in variable,
                                           # converts it to lower case, and returns true if
                                           # the word is not in the iterable stopwords
    
    
    ' '.join(iterable) #joins all items of the iterable (items must be strings/chars)
                       #using the string/char in front of the dot, i.e. ' ' as a joiner.
                       # i.e. ["What", "is","hello"] -> "What is hello"
    
    0 讨论(0)
  • 2020-11-30 07:03
    stopwords=['for','or','to']
    p='Asking for help, clarification, or responding to other answers.'
    for i in stopwords:
      n=p.replace(i,'')
      p=n
    print(p)
    
    0 讨论(0)
  • 2020-11-30 07:06

    the accepted answer works when provided a list of words separated by spaces, but that's not the case in real life when there can be punctuation to separate the words. In that case re.split is required.

    Also, testing against stopwords as a set makes lookup faster (even if there's a tradeoff between string hashing & lookup when there's a small number of words)

    My proposal:

    import re
    
    query = 'What is hello? Says Who?'
    stopwords = {'what','who','is','a','at','is','he'}
    
    resultwords  = [word for word in re.split("\W+",query) if word.lower() not in stopwords]
    print(resultwords)
    

    output (as list of words):

    ['hello','Says']
    
    0 讨论(0)
  • 2020-11-30 07:18

    This is one way to do it:

    query = 'What is hello'
    stopwords = ['what', 'who', 'is', 'a', 'at', 'is', 'he']
    querywords = query.split()
    
    resultwords  = [word for word in querywords if word.lower() not in stopwords]
    result = ' '.join(resultwords)
    
    print(result)
    

    I noticed that you want to also remove a word if its lower-case variant is in the list, so I've added a call to lower() in the condition check.

    0 讨论(0)
提交回复
热议问题