Regex to match specific functions and their arguments in files

前端 未结 6 1876
没有蜡笔的小新
没有蜡笔的小新 2021-01-16 07:33

I\'m working on a gettext javascript parser and I\'m stuck on the parsing regex.

I need to catch every argument passed to a specific method call _n( and

6条回答
  •  忘掉有多难
    2021-01-16 08:02

    \(( |"(\\"|[^"])*"|'(\\'|[^'])*'|[^)"'])*?\)

    This should get anything between a pair of parenthesis, ignoring parenthesis in quotes. Explanation:

    \( // Literal open paren
        (
             | //Space or
            "(\\"|[^"])*"| //Anything between two double quotes, including escaped quotes, or
            '(\\'|[^'])*'| //Anything between two single quotes, including escaped quotes, or
            [^)"'] //Any character that isn't a quote or close paren
        )*? // All that, as many times as necessary
    \) // Literal close paren
    

    No matter how you slice it, regular expressions are going to cause problems. They're hard to read, hard to maintain, and highly inefficient. I'm unfamiliar with gettext, but perhaps you could use a for loop?

    // This is just pseudocode.  A loop like this can be more readable, maintainable, and predictable than a regular expression.
    for(int i = 0; i < input.length; i++) {
        // Ignoring anything that isn't an opening paren
        if(input[i] == '(') {
            String capturedText = "";
            // Loop until a close paren is reached, or an EOF is reached
            for(; input[i] != ')' && i < input.length; i++) {
                if(input[i] == '"') {
                    // Loop until an unescaped close quote is reached, or an EOF is reached
                    for(; (input[i] != '"' || input[i - 1] == '\\') && i < input.length; i++) {
                        capturedText += input[i];
                    }
                }
                if(input[i] == "'") {
                    // Loop until an unescaped close quote is reached, or an EOF is reached
                    for(; (input[i] != "'" || input[i - 1] == '\\') && i < input.length; i++) {
                        capturedText += input[i];
                    }
                }
                capturedText += input[i];
            }
            capture(capturedText);
        }
    }
    

    Note: I didn't cover how to determine if it's a function or just a grouping symbol. (ie, this will match a = (b * c)). That's complicated, as is covered in detail here. As your code gets more and more accurate, you get closer and closer to writing your own javascript parser. You might want to take a look at the source code for actual javascript parsers if you need that sort of accuracy.

提交回复
热议问题