Matching quote wrapped strings in javascript with regex

偶尔善良 提交于 2021-02-05 06:43:05

问题


I need a regex for javascript for matching

"{any group of chars}" <-- where that last " is not preceeded by a \

examples:

... foo "bar" ...  => "bar"
... foo"bar\"" ... => "bar\""
... foo "bar" ...  goo"o"ooogle "t\"e\"st"[] => ["bar", "o", "t\"e\"st"]

The actual strings will be longer and may contain multiple matches that could also include white space or regex special chars.

I have started by trying to break down the syntax but not being strong with regex myself I got stuck pretty fast but i did get as far as matching everything except for the case where the match contains \" (i think) ...

https://regex101.com/r/sj4HXw/1

UPDATE:

More about my situation ...

This regex is to be used to "syntax highlight" strings in code blocks embedded in my blog posts so a real world example might look something like this ...

<pre id="test" class="code" data-code="csharp">
   if (ConfigurationManager.AppSettings["LogSql"] == "true")
</pre>

And I am using the following javascript to achieve the highlight ..

var result = $("#test").text().replace(/"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g, "<span class=\"string\">$1</span>");
$("#test").html(result);

For some reason even when the suggested answers (so far at least) are used in this context i'm getting odd results.

This works but puts the value $1 instead of the actual match for some reason.


回答1:


Simple scenario (as in OP)

The most efficient regex (that is written in accordance with the unroll-the-loop principle) you may use here is

"[^"\\]*(?:\\[\s\S][^"\\]*)*"

See the regex demo

Details:

  • " - match the first "
  • [^"\\]* - 0+ chars other than " and \
  • (?:\\[\s\S][^"\\]*)* - zer or more occurrences of:
    • \\[\s\S] - any char ([\s\S]) with a \ in front
    • [^"\\]* - 0+ chars other than " and \
  • " - a closing ".

Usage:

// MATCHING
var rx = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g;
var s = '    ... foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
var res = s.match(rx);
console.log(res);

// REPLACING
console.log(s.replace(rx, '<span>$&</span>'));

More advanced scenario

If there is an escaped " before a valid match or there are \s before a ", the approach above won't work. You will need to match those \s and capture the substring you need.

/(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g
 ^^^^^^^^^^^^^^^^^^^^^^                             ^

See another regex demo.

Usage:

// MATCHING
var rx = /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g;
var s = '    ... \\"foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
var m, res=[];
while (m = rx.exec(s)) {
  res.push(m[1]);
}
console.log(res);

// REPLACING
console.log(s.replace(/((?:^|[^\\])(?:\\{2})*)("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g, '$1<span>$2</span>'));

The main pattern is wrapped with capturing parentheses, and this is added at the start:

  • (?:^|[^\\]) - either start of string or any char but \
  • (?:\\{2})* - 0+ occurrences of a double backslash.



回答2:


Prioritize the escaped characters first:

"(\\.|[^"])*"

https://regex101.com/r/sj4HXw/2




回答3:


This should do it:

"(\\[\s\S]|[^"\\])*"

It's a mixture of the other answers from Wiktor and Taufik.



来源:https://stackoverflow.com/questions/43831523/matching-quote-wrapped-strings-in-javascript-with-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!