How to find and replace nth occurrence of word in a sentence using python regular expression?

后端 未结 7 2364
广开言路
广开言路 2020-12-01 11:00

Using python regular expression only, how to find and replace nth occurrence of word in a sentence? For example:

str = \'cat goose  mouse horse pig cat cow\'         


        
相关标签:
7条回答
  • 2020-12-01 11:38

    Use negative lookahead like below.

    >>> s = "cat goose  mouse horse pig cat cow"
    >>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
    'cat goose  mouse horse pig Bull cow'
    

    DEMO

    • ^ Asserts that we are at the start.
    • (?:(?!cat).)* Matches any character but not of cat , zero or more times.
    • cat matches the first cat substring.
    • (?:(?!cat).)* Matches any character but not of cat , zero or more times.
    • Now, enclose all the patterns inside a capturing group like ((?:(?!cat).)*cat(?:(?!cat).)*), so that we could refer those captured chars on later.
    • cat now the following second cat string is matched.

    OR

    >>> s = "cat goose  mouse horse pig cat cow"
    >>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s)
    'cat goose  mouse horse pig Bull cow'
    

    Change the number inside the {} to replace the first or second or nth occurrence of the string cat

    To replace the third occurrence of the string cat, put 2 inside the curly braces ..

    >>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose  mouse horse pig cat foo cat cow")
    'cat goose  mouse horse pig cat foo Bull cow'
    

    Play with the above regex on here ...

    0 讨论(0)
  • 2020-12-01 11:40

    Here's a way to do it without a regex:

    def replaceNth(s, source, target, n):
        inds = [i for i in range(len(s) - len(source)+1) if s[i:i+len(source)]==source]
        if len(inds) < n:
            return  # or maybe raise an error
        s = list(s)  # can't assign to string slices. So, let's listify
        s[inds[n-1]:inds[n-1]+len(source)] = target  # do n-1 because we start from the first occurrence of the string, not the 0-th
        return ''.join(s)
    

    Usage:

    In [278]: s
    Out[278]: 'cat goose  mouse horse pig cat cow'
    
    In [279]: replaceNth(s, 'cat', 'Bull', 2)
    Out[279]: 'cat goose  mouse horse pig Bull cow'
    
    In [280]: print(replaceNth(s, 'cat', 'Bull', 3))
    None
    
    0 讨论(0)
  • 2020-12-01 11:42

    I use simple function, which lists all occurrences, picks the nth one's position and uses it to split original string into two substrings. Then it replaces first occurrence in the second substring and joins substrings back into the new string:

    import re
    
    def replacenth(string, sub, wanted, n)
        where = [m.start() for m in re.finditer(sub, string)][n-1]
        before = string[:where]
        after = string[where:]
        after.replace(sub, wanted, 1)
        newString = before + after
        print newString
    

    For these variables:

    string = 'ababababababababab'
    sub = 'ab'
    wanted = 'CD'
    n = 5
    

    outputs:

    ababababCDabababab
    

    Notes:

    The where variable actually is a list of matches' positions, where you pick up the nth one. But list item index starts with 0 usually, not with 1. Therefore there is a n-1 index and n variable is the actual nth substring. My example finds 5th string. If you use n index and want to find 5th position, you'll need n to be 4. Which you use usually depends on the function, which generates our n.

    This should be the simplest way, but it isn't regex only as you originally wanted.

    Sources and some links in addition:

    • where construction: Find all occurrences of a substring in Python
    • string splitting: https://www.daniweb.com/programming/software-development/threads/452362/replace-nth-occurrence-of-any-sub-string-in-a-string
    • similar question: Find the nth occurrence of substring in a string
    0 讨论(0)
  • 2020-12-01 11:44

    You can match the two occurrences of "cat", keep everything before the second occurrence (\1) and add "Bull":

    new_str = re.sub(r'(cat.*?)cat', r'\1Bull', str, 1)
    

    We do only one substitution to avoid replacing the fourth, sixth, etc. occurrence of "cat" (when there are at least four occurrences), as pointed out by Avinash Raj comment.

    If you want to replace the n-th occurrence and not the second, use:

    n = 2
    new_str = re.sub('(cat.*?){%d}' % (n - 1) + 'cat', r'\1Bull', str, 1)
    

    BTW you should not use str as a variable name since it is a Python reserved keyword.

    0 讨论(0)
  • 2020-12-01 11:46

    How to replace the nth needle with word:

    s.replace(needle,'$$$',n-1).replace(needle,word,1).replace('$$$',needle)
    
    0 讨论(0)
  • 2020-12-01 11:51

    Create a repl function to pass into re.sub(). Except... the trick is to make it a class so you can track the call count.

    class ReplWrapper(object):
        def __init__(self, replacement, occurrence):
            self.count = 0
            self.replacement = replacement
            self.occurrence = occurrence
        def repl(self, match):
            self.count += 1
            if self.occurrence == 0 or self.occurrence == self.count:
                return match.expand(self.replacement)
            else:
                try:
                    return match.group(0)
                except IndexError:
                    return match.group(0)
    

    Then use it like this:

    myrepl = ReplWrapper(r'Bull', 0) # replaces all instances in a string
    new_str = re.sub(r'cat', myrepl.repl, str)
    
    myrepl = ReplWrapper(r'Bull', 1) # replaces 1st instance in a string
    new_str = re.sub(r'cat', myrepl.repl, str)
    
    myrepl = ReplWrapper(r'Bull', 2) # replaces 2nd instance in a string
    new_str = re.sub(r'cat', myrepl.repl, str)
    

    I'm sure there is a more clever way to avoid using a class, but this seemed straight-forward enough to explain. Also, be sure to return match.expand() as just returning the replacement value is not technically correct of someone decides to use \1 type templates.

    0 讨论(0)
提交回复
热议问题