I am not that hot at regular expressions and it has made my little mind melt some what.
I am trying to find all the tables names in a query. So say I have the query
I tried all the above but none worked since I use a wide variety of queries. I'm working with PHP though and used a PEAR library called SQL_Parser, but hope my solution helps. Also, I was having trouble with apostrophes and MySQL reserved sencences so I decided to strip off all the fields section from the query before parsing it.
function getQueryTable ($query) {
require_once "SQL/Parser.php";
$parser = new SQL_Parser();
$parser->setDialect('MySQL');
// Stripping fields section
$queryType = substr(strtoupper($query),0,6);
if($queryType == 'SELECT') { $query = "SELECT * ".stristr($query, "FROM"); }
if ($havingPos = stripos($query, 'HAVING')) { $query = substr($query, 0, $havingPos); }
$struct = $parser->parse($query);
$tableReferences = $struct[0]['from']['table_references']['table_factors'];
foreach ((Array) $tableReferences as $ref) {
$tables[] = ($ref['database'] ? $ref['database'].'.' : $ref['database']).$ref['table'];
}
return $tables;
}
One workaround is to implement a naming convention on tables and views. Then the SQL statement can be parsed on the naming prefix.
For example:
SELECT tbltable1.one, tbltable1.two, tbltable2.three
FROM tbltable1
INNER JOIN tbltable2
ON tbltable1.one = tbltable2.three
Split whitespace to array:
("SELECT","tbltable1.one,","tbltable1.two,","tbltable2.three","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1.one","=","tbltable2.three")
Get left of elements to period:
("SELECT","tbltable1","tbltable1","tbltable2","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1","=","tbltable2")
Remove elements with symbols:
("SELECT","tbltable1","tbltable1","tbltable2","FROM","tbltable1","INNER","JOIN","tbltable2","ON","tbltable1","tbltable2")
Reduce to unique values:
("SELECT","tbltable1","tbltable2","FROM","INNER","JOIN","ON")
Filter on Left 3 characters = "tbl"
("tbltable1","tbltable2")
Everything said about the usefulness of such a regex in the SQL context. If you insist on a regex and your SQL statements always look like the one you showed (that means no subqueries, joins, and so on), you could use
FROM\s+([^ ,]+)(?:\s*,\s*([^ ,]+))*\s+
This will pull out a table name on an insert Into query:
(?<=(INTO)\s)[^\s]*(?=\(())
The Following will do the same but with a select including joins
(?<=(from|join)\s)[^\s]*(?=\s(on|join|where))
Finally going back to an insert if you want to return just the values that are held in an insert query use the following Regex
(?i)(?<=VALUES[ ]*\().*(?=\))
I know this is an old thread but it may assist someone else looking around
Enjoy
It's definitely not easy.
Consider subqueries.
select
*
from
A
join (
select
top 5 *
from
B)
on B.ID = A.ID
where
A.ID in (
select
ID
from
C
where C.DOB = A.DOB)
There are three tables used in this query.
I think it would be easier to tokenize the string and look for SQL keywords that could bound the table names. You know the names will follow FROM, but they could be followed by WHERE, GROUP BY, HAVING, or no keyword at all if they're at the end of the query.