问题
I have this text (it's a string value, not a language expression):
hello = world + 'foo bar' + gizmo.hoozit + "escaped \"quotes\"";
And I would like to find all words ([a-zA-Z]+) which are not enclosed in double or single quotes. The quotes can be escaped (\" or \'). The result should be:
hello, world, gizmo, hoozit
Can I do this using regular expressions in JavaScript?
回答1:
you can use this pattern, what you need is in the second capturing group:
EDIT: a little bit shorter with a negative lookahead:
var re = /(['"])(?:[^"'\\]+|(?!\1)["']|\\{2}|\\[\s\S])*\1|([a-z]+)/ig
var mystr = 'hello = world + \'foo bar\' + gizmo.hoozit + "escaped \\"quotes\\"";';
var result = Array();
while (match = re.exec(mystr)) {
if (match[2]) result.push(match[2]);
}
console.log(mystr);
console.log(result);
the idea is to match content enclosed between quotes before the target.
Enclosed content details: '(?:[^'\\]+|\\{2}|\\[\s\S])*'
(["']) # literal single quote
(?: # open a non capturing group
[^"'\\]+ # all that is not a quote or a backslash
| # OR
(?!\1)["'] # a quote but not the captured quote
| # OR
\\{2} # 2 backslashes (to compose all even numbers of backslash)*
| # OR
\\[\s\S] # an escaped character (to allow escaped single quotes)
)* # repeat the group zero or more times
\1 # the closing single quote (backreference)
(* an even number of backslashes doesn't escape anything)
回答2:
You might want to use several regular expression methods one after the other for simplicity and clarity of function (large Regexes may be fast, but they're hard to construct, understand and edit): first remove all escaped quotes, then remove all quoted strings, then run your search.
var matches = string
.replace( /\\'|\\"/g, '' )
.replace( /'[^']*'|"[^']*"/g, '' )
.match( /\w+/g );
A few notes on the regular expressions involved:
- The central construct in the 2nd replacement is character (
'), followed by zero or more (*) of any character from the set ([]) which does not (^) conform to character (') |means or, meaning either the part before or after the pipe can be matched- '\w' means 'any word character', and works as a shorthand for '[a-zA-Z]'
jsFiddle demo.
回答3:
- Replace each escaped quote with an empty string;
- Replace each pair of quotes and the string between with an empty string:
- If you use a capture group for the opening quote
(["'])then you can use a back-reference\1to match the same style quote at the other end of the quoted string; - Matching with a back reference means you need to use a non-greedy (match as few characters as possible) wildcard match
.*?to get the minimum possible quoted string.
- If you use a capture group for the opening quote
- Finally, find the matches using your regular expression
[a-zA-Z]+.
Like this:
var text = "hello = world + 'foo bar' + gizmo.hoozit + \"escaped \\\"quotes\\\"\";";
var matches = text.replace( /\\["']/g, '' )
.replace( /(["']).*?\1/g, '' )
.match( /[a-zA-Z]+/g );
console.log( matches );
来源:https://stackoverflow.com/questions/20351377/regexp-find-all-occurences-but-not-inside-quotes