How to add tags to negated words in strings that follow “not”, “no” and “never”

后端 未结 3 678
醉话见心
醉话见心 2021-01-03 12:01

How do I add the tag NEG_ to all words that follow not, no and never until the next punctuation mark in a string(used for

3条回答
  •  独厮守ぢ
    2021-01-03 12:17

    To make up for Python's re regex engine's lack of some Perl abilities, you can use a lambda expression in a re.sub function to create a dynamic replacement:

    import re
    string = "It was never going to work, he thought. He did not play so well, so he had to practice some more. Not foobar !"
    transformed = re.sub(r'\b(?:not|never|no)\b[\w\s]+[^\w\s]', 
           lambda match: re.sub(r'(\s+)(\w+)', r'\1NEG_\2', match.group(0)), 
           string,
           flags=re.IGNORECASE)
    

    Will print (demo here)

    It was never NEG_going NEG_to NEG_work, he thought. He did not NEG_play NEG_so NEG_well, so he had to practice some more. Not NEG_foobar !
    

    Explanation

    • The first step is to select the parts of your string you're interested in. This is done with

      \b(?:not|never|no)\b[\w\s]+[^\w\s]
      

      Your negative keyword (\b is a word boundary, (?:...) a non capturing group), followed by alpahnum and spaces (\w is [0-9a-zA-Z_], \s is all kind of whitespaces), up until something that's neither an alphanum nor a space (acting as punctuation).

      Note that the punctuation is mandatory here, but you could safely remove [^\w\s] to match end of string as well.

    • Now you're dealing with never going to work, kind of strings. Just select the words preceded by spaces with

      (\s+)(\w+)
      

      And replace them with what you want

      \1NEG_\2
      

提交回复
热议问题