Regex to source code comments

谁都会走 提交于 2019-12-07 19:09:14

问题


Is there a regular expression to match any line comment, but avoiding the comments inside strings? I need all content in a line after // (with the // included)

For example:

//Comment (match!)
bla bla bla bla //Comment (match!)
this string "foo // foo" (don't match because it's inside "")

回答1:


The following regular expression will correctly match any strings and regular expressions in the input:

var strings = /("((.|\\\n)*?([^\\"]|\\\\)|)"|'((.|\\\n)*?([^\\']|\\\\)|)'|\/[^*](.*([^\\\/]|\\\\))\/|\/\*\/)/g;

You can remove strings from the input and then match comments using another regular expression:

var comments = /((\/\/)(.*)|(\/\*)((.|\n)*)(\*\/))/g;
input.replace(strings, "").match(comments);

var strings = /("((.|\\\n)*?([^\\"]|\\\\)|)"|'((.|\\\n)*?([^\\']|\\\\)|)'|\/[^*](.*([^\\\/]|\\\\))\/|\/\*\/)/g,
    comments = /((\/\/)(.*)|(\/\*)((.|\n)*)(\*\/))/g;

function update() {
  var arr = input.value.replace(strings, "").match(comments);
  output.value = arr ? arr.join("\n") : "";
}

input.onkeydown = input.onkeyup = input.onchange = update;
update();
textarea {
  width: 90%;
  height: 5em;
}
<p>Input:</p>
<textarea id="input">
//Comment (match!)
bla bla bla bla //Comment (match!)
this string "foo // foo"
</textarea>

<p>Output:</p>
<textarea id="output">
</textarea>



回答2:


This regex will work in all cases (see regex101 example):

(("[^"]*){2})*(\/\/.*)

You want anything matched by the third capture group. Alternately, you could make the first two groups non-capturing.

It works by skipping any even number of quotes followed by other text, before hitting double slashes.




回答3:


^[^"]*(//.*)

Will not catch all cases but at least your examples should work

Update: A ^ was missing at the beginning.




回答4:


Here's another solution that should catch every single-line comment (see it work on regex101):

(\/\/.*)|"(?:\\"|.)*?"

All the comments will be captured in the first match group.

It will work in any regex flavor that has lazy quantifiers, which is almost all of them. The technique I used is to match quoted strings specifically so they get "removed" from the text available to match what we want: comments. This technique is explained in detail on RexEgg.com as The Greatest Regex Trick Ever.

Breakdown:

(\/\/.*) matches comments, and captures in group

"(?:\\"|.)*?" matches quoted strings, avoiding any escaped quotes inside

  • The inside non-capturing group (?:\\"|.) matches an escaped quote OR the next character, successfully passing right over the escaped quotes rather than having them match as a "real" quote
  • The whole alternation has the *? lazy quantifier so it hits the next "real" quote, rather than proceeding to another quoted string.


来源:https://stackoverflow.com/questions/27534037/regex-to-source-code-comments

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!