Regular expression to find all table names in a query

前端 未结 12 980
一个人的身影
一个人的身影 2020-12-03 18:09

I am not that hot at regular expressions and it has made my little mind melt some what.

I am trying to find all the tables names in a query. So say I have the query

相关标签:
12条回答
  • 2020-12-03 18:55

    I'm pretty late to the party however I thought I would share a regex I am currently using to analyse all our database objects and I disagree with the sentiment that it is not possible to do this using one.

    The regex has a few assumptions

    1) You are not using the A,B join syntax style

    2) Whatever regex parser you are using supports ignore case.

    3) You're analyzing, selects, joins, updates, deletes and truncates. It doesn't support the aforementioned MERGE/NATURAL because we don't use them, however I'm sure further support wouldn't be difficult to add.

    I am keen to know what type of transaction the table is part of so I have included Named Capture groups to tell me.

    Now I've not used regex for a long time so there are probably improvements that can be made however so far in all my testing this is accurate.

    \bjoin\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bfrom\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bupdate\s+(?<Update>[a-zA-Z\._\d]+)\b|\binsert\s+(?:\binto\b)?\s+(?<Insert>[a-zA-Z\._\d]+)\b|\btruncate\s+table\s+(?<Delete>[a-zA-Z\._\d]+)\b|\bdelete\s+(?:\bfrom\b)?\s+(?<Delete>[a-zA-Z\._\d]+)\b
    
    0 讨论(0)
  • 2020-12-03 18:56

    RegEx isn't very good at this, as it's a lot more complicated than it appears:

    • What if they use LEFT/RIGHT INNER/OUTER/CROSS/MERGE/NATURAL joins instead of the a,b syntax? The a,b syntax should be avoided anyway.
    • What about nested queries?
    • What if there is no table (selecting a constant)
    • What about line breaks and other whitespace formatting?
    • Alias names?

    I could go on.

    What you can do is look for an sql parser, and run your query through that.

    0 讨论(0)
  • 2020-12-03 18:56

    I used this code as an Excel macro to parse the select and extract table names.

    My parsing assumes that the syntax select from a , b , c is not used.

    Just run it against your SQL query and if you are not satisfied with the result you should be only a few lines of codes away from the result you expect. Just debug and modify the code accordingly.

    Sub get_tables()
        sql_query = Cells(5, 1).Value
        tables = ""
    
        'get all tables after from
        sql_from = sql_query
    
        While InStr(1, UCase(sql_from), UCase("from")) > 0
    
            i = InStr(1, UCase(sql_from), UCase("from"))
            sql_from = Mid(sql_from, i + 5, Len(sql_from) - i - 5)
            i = InStr(1, UCase(sql_from), UCase(" "))
    
            While i = 1
    
                sql_from = Mid(sql_from, 2, Len(sql_from) - 1)
                i = InStr(1, UCase(sql_from), UCase(" "))
    
            end
    
            i = InStr(1, sql_join, Chr(9))
    
            While i = 1
    
                sql_join = Mid(sql_join, 2, Len(sql_join) - 1)
                i = InStr(1, sql_join, Chr(9))
    
            end
    
            a = InStr(1, UCase(sql_from), UCase(" "))
            b = InStr(1, sql_from, Chr(10))
            c = InStr(1, sql_from, Chr(13))
            d = InStr(1, sql_from, Chr(9))
    
            MinC = a
    
            If MinC > b And b > 0 Then MinC = b
            If MinC > c And c > 0 Then MinC = c
            If MinC > d And d > 0 Then MinC = d
    
            tables = tables + "[" + Mid(sql_from, 1, MinC - 1) + "]"
    
        end
    
        'get all tables after join
        sql_join = sql_query
    
        While InStr(1, UCase(sql_join), UCase("join")) > 0
    
            i = InStr(1, UCase(sql_join), UCase("join"))
            sql_join = Mid(sql_join, i + 5, Len(sql_join) - i - 5)
            i = InStr(1, UCase(sql_join), UCase(" "))
    
            While i = 1
    
                sql_join = Mid(sql_join, 2, Len(sql_join) - 1)
                i = InStr(1, UCase(sql_join), UCase(" "))
    
            end
    
            i = InStr(1, sql_join, Chr(9))
    
            While i = 1
    
                sql_join = Mid(sql_join, 2, Len(sql_join) - 1)
                i = InStr(1, sql_join, Chr(9))
    
            end
    
            a = InStr(1, UCase(sql_join), UCase(" "))
            b = InStr(1, sql_join, Chr(10))
            c = InStr(1, sql_join, Chr(13))
            d = InStr(1, sql_join, Chr(9))
    
            MinC = a
    
            If MinC > b And b > 0 Then MinC = b
            If MinC > c And c > 0 Then MinC = c
            If MinC > d And d > 0 Then MinC = d
    
            tables = tables + "[" + Mid(sql_join, 1, MinC - 1) + "]"
    
        end
    
        tables = Replace(tables, ")", "")
        tables = Replace(tables, "(", "")
        tables = Replace(tables, " ", "")
        tables = Replace(tables, Chr(10), "")
        tables = Replace(tables, Chr(13), "")
        tables = Replace(tables, Chr(9), "")
        tables = Replace(tables, "[]", "")
    
    End Sub
    
    0 讨论(0)
  • 2020-12-03 19:00

    In PHP, I use this function, it returns an array with the table names used in a sql statement:

    function sql_query_get_tables($statement){
        preg_match_all("/(from|into|update|join) [\\'\\´]?([a-zA-Z0-9_-]+)[\\'\\´]?/i",
                $statement, $matches);
        if(!empty($matches)){
            return array_unique($matches[2]);
        }else return array();
    }
    

    Notice that it does not work with a,b joins or schema.tablename naming

    I hope it works for you

    0 讨论(0)
  • 2020-12-03 19:01

    I found this site that has a GREAT parser!

    http://www.sqlparser.com/

    well worth it. Works a treat.

    0 讨论(0)
  • 2020-12-03 19:08

    Constructing a regular expression is going to be the least of your problems. Depending on the flavor of SQL you expect to support with this code, the number of ways you can reference a table in a SQL statement is staggering.

    PLUS, if the query includes a reference to a view or UDF, the information about what underlying tables won't even be in the string at all making it completely impractical to get that information by parsing it. Also, you'd need to be smart about detecting temporary tables and excluding them from your results.

    If you must do this, a better approach would be to make use of the APIs to the particular database engine that the SQL was intended for. For example you could create a view based on the query and then use the DB Server api to detect dependencies for that view. The DB engine is going to be able to parse it much more reliably than you ever will without an enormous effort to reverse engineer the query engine.

    If, by chance, you are working with SQL Server, here is an article about detecting dependencies on that platform: Finding Dependencies in SQL Server 2005

    0 讨论(0)
提交回复
热议问题