I am not that hot at regular expressions and it has made my little mind melt some what.
I am trying to find all the tables names in a query. So say I have the query
I tried all the above but none worked since I use a wide variety of queries. I'm working with PHP though and used a PEAR library called SQL_Parser, but hope my solution helps. Also, I was having trouble with apostrophes and MySQL reserved sencences so I decided to strip off all the fields section from the query before parsing it.
function getQueryTable ($query) {
require_once "SQL/Parser.php";
$parser = new SQL_Parser();
$parser->setDialect('MySQL');
// Stripping fields section
$queryType = substr(strtoupper($query),0,6);
if($queryType == 'SELECT') { $query = "SELECT * ".stristr($query, "FROM"); }
if ($havingPos = stripos($query, 'HAVING')) { $query = substr($query, 0, $havingPos); }
$struct = $parser->parse($query);
$tableReferences = $struct[0]['from']['table_references']['table_factors'];
foreach ((Array) $tableReferences as $ref) {
$tables[] = ($ref['database'] ? $ref['database'].'.' : $ref['database']).$ref['table'];
}
return $tables;
}
One workaround is to implement a naming convention on tables and views. Then the SQL statement can be parsed on the naming prefix.
For example:
SELECT tbltable1.one, tbltable1.two, tbltable2.three
FROM tbltable1
INNER JOIN tbltable2
ON tbltable1.one = tbltable2.three
Split whitespace to array:
("SELECT","tbltable1.one,","tbltable1.two,","tbltable2.three","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1.one","=","tbltable2.three")
Get left of elements to period:
("SELECT","tbltable1","tbltable1","tbltable2","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1","=","tbltable2")
Remove elements with symbols:
("SELECT","tbltable1","tbltable1","tbltable2","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1","tbltable2")
Reduce to unique values:
("SELECT","tbltable1","tbltable2","FROM","INNER","JOIN","ON")
Filter on Left 3 characters = "tbl"
("tbltable1","tbltable2")
Everything said about the usefulness of such a regex in the SQL context. If you insist on a regex and your SQL statements always look like the one you showed (that means no subqueries, joins, and so on), you could use
FROM\s+([^ ,]+)(?:\s*,\s*([^ ,]+))*\s+
This will pull out a table name on an insert Into query:
(?<=(INTO)\s)[^\s]*(?=\(())
The Following will do the same but with a select including joins
(?<=(from|join)\s)[^\s]*(?=\s(on|join|where))
Finally going back to an insert if you want to return just the values that are held in an insert query use the following Regex
(?i)(?<=VALUES[ ]*\().*(?=\))
I know this is an old thread but it may assist someone else looking around
Enjoy
It's definitely not easy.
Consider subqueries.
select
*
from
A
join (
select
top 5 *
from
B)
on B.ID = A.ID
where
A.ID in (
select
ID
from
C
where C.DOB = A.DOB)
There are three tables used in this query.
I think it would be easier to tokenize the string and look for SQL keywords that could bound the table names. You know the names will follow FROM
, but they could be followed by WHERE
, GROUP BY
, HAVING
, or no keyword at all if they're at the end of the query.