Regular expression to extract SQL query

后端 未结 5 893
青春惊慌失措
青春惊慌失措 2020-12-18 08:19

Is there a regex which extracts SQL queries from a string? I\'m NOT interested to validate any SQL syntax, rather and only extracting a selection o

相关标签:
5条回答
  • 2020-12-18 08:27

    I'll start off by saying that this is not a good way of doing it, and strongly urge you to find another method of doing it, preferrably tagging it properly where the statements are made, so you don't end up in this situation.

    That being said, SQL requires it to start with one of the following; DELETE, SELECT, WITH, UPDATE or INSERT INTO. It also requires that the input ends with ;.

    We can use this to grab all sequences matching SQL with the following:

    final String regex = "^(INSERT INTO|UPDATE|SELECT|WITH|DELETE)(?:[^;']|(?:'[^']+'))+;\\s*$";
    final Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
    

    Group 1 now holds the operating word, in case you wish to filter valid SQL on UPDATE or SELECT.

    See the regex in action, as well as a cave-at here:

    https://regex101.com/r/dt9XTK/2

    0 讨论(0)
  • 2020-12-18 08:29

    You can match it "properly" as long as the semicolon is the last non-whitespace character on that line.

    final String regex = ^(SELECT|UPDATE|INSERT)[\s\S]+?\;\s*?$
    
    final Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = p.matcher(content);
    
    0 讨论(0)
  • 2020-12-18 08:29

    If you're dealing with a language, create a lexer that tokenizes your string. Use JFlex, which is a lexical analyzer generator. It generates a Java class that splits a string into tokens based on a grammar specified in a special file. Take the relevant grammar rules from this file.

    Parsing is a separate process than tokenization (or lexical analysis). You might want to use a parser generator, after lexical analysis, if lexical analysis is not enough.

    0 讨论(0)
  • 2020-12-18 08:31

    SQL is complicated enough that you will need context to find all statements, meaning that you can't do this with a regular expression.

    For example:

    SELECT Model FROM Product
    WHERE ManufacturerID IN (SELECT ManufacturerID FROM Manufacturer 
    WHERE Manufacturer = 'Dell')
    

    (example comes from http://www.sql-tutorial.com/sql-nested-queries-sql-tutorial/). Nested queries can be nested multiple times, start with different values, etc. If you could write a regular expression for the subset you are interested in, it would be unreadable.

    ANTLR has a SQL 2003 grammar available (I haven't tried it).

    0 讨论(0)
  • 2020-12-18 08:46

    (?m)^(UPDATE|SELECT|INSERT INTO).*;$ should work. This would extend the pattern to match over newlines. It should be able to loop through and find all your SQL.

    Looking at the example you provided it will match your commands until the ;. You can see the example used for testing here.

    0 讨论(0)
提交回复
热议问题