Regex for comments in strings, strings in comments, etc

廉价感情. 提交于 2019-12-27 12:07:45

问题


This a question I've solved and wanted to post in Q&A style because I think more people could use the solution. Or maybe improve the solution, show where it breaks.

The problem

You wanna do something with quoted strings and/or comments in a body of text. You wanna extract them, highlight them, what have you. But some quoted strings are inside comments, and sometimes comment-characters are inside strings. And strings delimiters can be escaped, and comments can be line-comments or block comments. And when you thought you had a solution somebody complains that it doesn't work when there's a regex-literal in his JavaScript. What do?

Concrete example

var ret = row.match(/'([^']+)'/i); // Get 1st single quoted string's content
if (!ret) return ''; /* return if there's no matches 
                        Otherwise turn into xml: */
var message = '\t<' + ret[1].replace(/\[1]/g, '').replace(/\/@(\w+)/i, ' $1=""') + '></' + ret[1].match(/[A-Z_]\w*/i)[0] + '>';

alert('xml: \'' + message + '\''); /*
alert("xml: '" + message + "'"); // */

var line = prompt('How do line-comments start? (e.g. //)', '//');

// do something with line

This code is nonsense, but how do I do the right thing in each of the cases of the above JavaScript?

The only thing I found that comes close is this: Comments in string and strings in comments where Jan Goyvaerts himself answered with a similar approach. But that one doesn't handle apostrophe-escaping yet.


回答1:


I've broken the regex into 4 lines corresponding with the 4 paths in the graph, don't keep those line-breaks in there if you ever use this.

(['"])(?:(?!\1|\\).|\\.)*\1|
\/(?![*/])(?:[^\\/]|\\.)+\/[igm]*|
\/\/[^\n]*(?:\n|$)|
\/\*(?:[^*]|\*(?!\/))*\*\/

Debuggex Demo

This code grabs 4 types of "blocks" that can contain the other 3. You can iterate through this and do with each one whatever you want or discard it because it's not the one you wanna do anything to.

This one is specific for JavaScript as it's a language I'm familiar with. But you could easily adapt this to the language of your preference.

Anyone see a way in which this code breaks?

Edit I have since been notified that the general pattern is described very well here: https://stackoverflow.com/a/23589204/2684660, neato!



来源:https://stackoverflow.com/questions/25402109/regex-for-comments-in-strings-strings-in-comments-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!