Regex to match MySQL comments

谁说我不能喝 提交于 2019-12-09 03:24:50

问题


I need to find and remove all of the comments from a MySQL query. The problem I'm having is avoiding comment markers (--, #, /* ... */) that are inside of quotes or backticks.


回答1:


In PHP, i'm using this code to uncomment SQL:

$sqlComments = '@(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
/* Commented version
$sqlComments = '@
    (([\'"`]).*?[^\\\]\2) # $1 : Skip single & double quoted + backticked expressions
    |(                   # $3 : Match comments
        (?:\#|--).*?$    # - Single line comments
        |                # - Multi line (nested) comments
         /\*             #   . comment open marker
            (?: [^/*]    #   . non comment-marker characters
                |/(?!\*) #   . ! not a comment open
                |\*(?!/) #   . ! not a comment close
                |(?R)    #   . recursive case
            )*           #   . repeat eventually
        \*\/             #   . comment close marker
    )\s*                 # Trim after comments
    |(?<=;)\s+           # Trim after semi-colon
    @msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );



回答2:


Unfortunately, what you are trying to do requires a context free grammar and cannot be done with a regular expression. It's because of the nesting, and in computer science theory, we require a stack to track when you are nested in quotes or what-not. (Technically this requires a push down automata instead of a regular language. Blah blah academia blah...) It isn't hard to implement, but is has to be done procedurally, and honestly, it may require more effort than you want to expend.

If you don't mind cutting and pasting, you can use SQLInform. The online mode is free and supports comment removal.

UPDATE

Considering the comment I received below, I played around with the MySQL editor. I was mistaken -- they've actually prohibited nesting anything deeper than one level. You can no longer nest a comment inside a comment (if you ever could). At any rate, I'll leave my answer up just for the SQLInform link.




回答3:


Someone has written it for you. Convert to whichever language you require.

Use Regular Expressions to Clean SQL Statements

Here is the C# translation included in the Answer in case the original link ever goes away. I haven't tested this, but it looks sound.

public static string ToRaw(string commandText)
{
    RegexOptions regExOptions = (RegexOptions.IgnoreCase | RegexOptions.Multiline);
    string rawText=commandText;
    string regExText = @”(‘(”|[^'])*’)|([\r|\n][\s| ]*[\r|\n])|(–[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)”;
    //string regExText = @”(‘(”|[^'])*’)|[\t\r\n]|(–[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)”;
    //’Replace Tab, Carriage Return, Line Feed, Single-row Comments and
    //’Multi-row Comments with a space when not included inside a text block.

    MatchCollection patternMatchList = Regex.Matches(rawText, regExText, regExOptions);
    int iSkipLength = 0;
    for (int patternIndex = 0; patternIndex < patternMatchList.Count; patternIndex++)
    {
        if (!patternMatchList[patternIndex].Value.StartsWith("'") && !patternMatchList[patternIndex].Value.EndsWith("'"))
        {
            rawText = rawText.Substring(0, patternMatchList[patternIndex].Index – iSkipLength) + " " + rawText.Substring(patternMatchList[patternIndex].Index – iSkipLength + patternMatchList[patternIndex].Length);
            iSkipLength += (patternMatchList[patternIndex].Length – " ".Length);
        }
    }
    //'Remove extra spacing that is not contained inside text qualifers.
    patternMatchList = Regex.Matches(rawText, "'([^']|'')*'|[ ]{2,}", regExOptions);
    iSkipLength = 0;
    for (int patternIndex = 0; patternIndex < patternMatchList.Count; patternIndex++)
    {
        if (!patternMatchList[patternIndex].Value.StartsWith("'") && !patternMatchList[patternIndex].Value.EndsWith("'"))
        {
            rawText = rawText.Substring(0, patternMatchList[patternIndex].Index – iSkipLength)+" " + rawText.Substring(patternMatchList[patternIndex].Index – iSkipLength + patternMatchList[patternIndex].Length);
            iSkipLength += (patternMatchList[patternIndex].Length – " ".Length);
        }
    }
    //'Return value without leading and trailing spaces.
    return rawText.Trim();
}



回答4:


This code works for me:

function strip_sqlcomment ($string = '') {
    $RXSQLComments = '@('(''|[^'])*')|(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)@ms';
    return (($string == '') ?  '' : preg_replace( $RXSQLComments, '', $string ));
}

with a little regex tweak it could be used to strip comments in any language




回答5:


Unfortunately you can only do very limited SQL formatting with regular expressions. The main reason is that there are e.g. comments which you do not want to remove or tokens which you can not lower/uppercase as they are part of a literal and it is not always easy to find the beginning and end of literals as different SQL dialects use different enclosing chars and sometimes even use several chars to enclose a literal. Sometimes people put pieces of SQL in comment for later re-use. You do not want to reformat these pieces of SQL. When you change a SQL statement with a regular expression run the changed SQL again in your DB Tool to make sure you did not change anything to the logic. I heard about people to run regular expressions on hundred od SQL files without checking th results. I think this is a very dangerous step. Never change a running SQL ;-)



来源:https://stackoverflow.com/questions/7100127/regex-to-match-mysql-comments

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!