问题
I have searched high and low and am really struggling to find the appropriate REGEX that would help me retrieve what I want. Assume I have the following query string:
SELECT col, col2 AS c2, col3, col * col2 calc FROM...
I want a REGEX that will pull out everything between commas where there is a space or an "AS" that proceeds a column alias. It should also exclude the SELECT and FROM. With the example above, I would want the following matches:
col2 AS c2
col * col2 calc
Essentially, this should remove col and col3 because they don't have aliases. I have tried variations of the following but have been unsuccessful (Note: $colname contains the alias):
/(SELECT|,)([^,]*{$colname}[^,]*)(FROM|,)/
Currently this only returns:
, col2 AS c2,
col2 AS c2
, col2 AS c2,
col2 AS c2
I feel I am close but I need some guidance. Any suggestions are appreciated.
回答1:
It is next to impossible to handle SQL string parsing with one regex, you should look for a more generic solution like PHP-SQL-Parser.
The regex I have come up with is an ad-hoc work around that can be further precised, adjusted, etc. to fit the needs of a specific project:
(?:\G(?!\A),|SELECT)(?:\s+[^\s,]+(?:\([^()]*\))?,)*\s*\K[^\s,]+\s*(?:(?![()',])[[:punct:]]|AS)\s*(?:(?!FROM)[^,])*(?!\w)
See the regex demo.
Here is what it does:
(?:\G(?!\A),|SELECT)- the end of the previous successful match and a comma after (\G(?!\A),) or (|) a literalSELECTsubstring(?:\s+[^\s,]+(?:\([^()]*\))?,)*- zero or more sequences ((?:...)*) of:\s+- 1+ whitespaces[^\s,]+- 1+ chars other than whitespace and a comma(?:\([^()]*\))?- an optional substring like(...)(a(, then 0+ chars other than(and), and then a)),- a comma
\s*- 0+ whitespace chars\K- a match reset operator discarding the text matched so far (so,SELECT abc, Contains(...),will get omitted from the match, but will be required for the match)[^\s,]+- 1 or more chars other than whitespace and,\s*- 0+ whitespace chars(?:(?![()',])[[:punct:]]|AS)- either(?![()',])[[:punct:]]- any punctuation or symbol char but a(,),'and,|- orAS- a literalASsubstring
\s*- 0+ whitespaces(?:(?!FROM)[^,])*- zero or more chars other than a comma not starting aFROMchar sequence (it is a tempered greedy token)(?!\w)- not followed with a word char.
来源:https://stackoverflow.com/questions/48965423/getting-alias-columns-in-query-using-regex