Regular expression: match word not between quotes

百般思念 提交于 2020-07-15 08:49:28

问题


I would like a Python regular expression that matches a given word that's not between simple quotes. I've tried to use the (?! ...) but without success.

In the following screenshot, I would like to match all foe except the one in the 4th line.

Plus, the text is given as one big string.

Here is the link regex101 and the sample text is below:

var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe

回答1:


A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.

A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.

Here is a sample Python demo:

import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
    var foe = 10;
    foe = "";
    dark_vador = 'bad guy'
    foe = ' I\'m your father, foe ! '
    bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)

See the Python demo

The regex will look like

('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b

See the regex demo.

The ('[^'\\]*(?:\\.[^'\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b matches whole words foe in any other string context - and subsequently is replaced with another word.

NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")".




回答2:


You can try this:-

((?!\'[\w\s]*)foe(?![\w\s]*\'))




回答3:


How about this regular expression:

>>> s = '''var foe = 10;
foe = "";
dark_vador = 'bad guy'
' I\m your father, foe ! '
bar = thingy + foe'''
>>>
>>> re.findall(r'(?!\'.*)foe(?!.*\')', s)
['foe', 'foe', 'foe']

The trick here is to make sure the expression does not match any string with leading and trailing ' and to remember to account for the characters in between, thereafter .* in the re expression.




回答4:


((?!\'[\w\s]*[\\']*[\w\s]*)foe(?![\w\s]*[\\']*[\w\s]*\'))



回答5:


Capture group 1 of the following regular expression will contain matches of 'foe'.

r'^(?:[^'\n]|\\')*(?:(?<!\\)'(?:[^'\n]|\\')*(?:(?<!\\)')(?:[^'\n]|\\')*)*\b(foe)\b'

Start your engine!

Python's regex engine performs the following operations.

^           : assert beginning of string
(?:         : begin non-capture group
  [^'\n]    : match any char other than single quote and line terminator
  |         : or
  \\'       : match '\' then a single quote
)           : end non-capture group   
*           : execute non-capture group 0+ times
(?:         : begin non-capture group
  (?<!\\)   : next char is not preceded by '\' (negative lookbehind)
  '         : match single quote
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
  (?:       : begin non-capture group
    (?<!\\) : next char is not preceded by '\' (negative lookbehind)
    '       : match single quote
  )         : end non-capture group
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
)           : end non-capture group
*           : execute non-capture group 0+ times
\b(foe)\b   : match 'foe' in capture group 1


来源:https://stackoverflow.com/questions/41137995/regular-expression-match-word-not-between-quotes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!