Replace patterns that are inside delimiters using a regular expression call

后端未结

关注

 5  1058

轮回少年 2021-01-07 02:13

I need to clip out all the occurances of the pattern \'--\' that are inside single quotes in long string (leaving intact the ones that are outside single quotes). <

5条回答

没有蜡笔的小新 (楼主)

2021-01-07 02:26

If bending the rules a little is allowed, this could work:

import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- ffffdd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)

Output:

xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- ffffdd -- 'dfdf' ghh-g '-ggh-' vcbcvb

The regex:

(               # Group 1
  (?:^[^']*')?  # Start of string, up till the first single quote
  [^']*?        # Inside the single quotes, as few characters as possible
  (?:
    '[^']*'     # No double dashes inside theses single quotes, jump to the next.
    [^']*?
  )*?           # as few as possible
)
(-{2,})         # The dashes themselves (Group 2)

If there where different delimiters for start and end, you could use something like this:

-{2,}(?=[^'`]*`)

Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change

(?:^[^']*')?

in the beginning to

(?:^[^']*'|(?!^))

Updated regex:

((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})

0 讨论(0)

查看其它5个回答