问题
I am building a simple parser that takes a query like the following: 'show fizi commits from 1/1/2010 to 11/2/2006' So far I have:
class QueryParser(object):
def parser(self, stmnt):
keywords = ["select", "from","to", "show","commits", "where", "group by", "order by", "and", "or"]
[select, _from, _to, show, commits, where, groupby, orderby, _and, _or] = [ CaselessKeyword(word) for word in keywords ]
user = Word(alphas+"."+alphas)
user2 = Combine(user + "'s")
startdate=self.getdate()
enddate=self.getdate()
bnf = (show|select)+(user|user2).setResultsName("user")+(commits).setResultsName("stats")\
+Optional(_from+startdate.setResultsName("start")+_to+enddate.setResultsName("end"))
a = bnf.parseString(stmnt)
return a
def getdate(self):
integer = Word(nums).setParseAction(lambda t: int(t[0]))
date = Combine(integer('year') + '/' + integer('month') + '/' + integer('day'))
#date.setParseAction(self.convertToDatetime)
return date
I would like the dates to be more generic. Meaning user can provide 20 Jan, 2010 or some other date format. I found a good date parsing online that does exactly that. It takes a date as a string and then parses it. So what I am left with is to feed that function the date string I get from my parser. How do I go about tokenizing and capturing the two date strings. For now it only captures the format 'y/m/d' format. Is there a way to just get the entire string regarless of how its formatted. Something like capture the word right after keywords and . Any help is greatly appreciated.
回答1:
A simple approach is to require the date be quoted. A rough example is something like this, but you'll need to adjust to fit in with your current grammar if needs be:
from pyparsing import CaselessKeyword, quotedString, removeQuotes
from dateutil.parser import parse as parse_date
dp = (
CaselessKeyword('from') + quotedString.setParseAction(removeQuotes)('from') +
CaselessKeyword('to') + quotedString.setParseAction(removeQuotes)('to')
)
res = dp.parseString('from "jan 20" to "apr 5"')
from_date = parse_date(res['from'])
to_date = parse_date(res['to'])
# from_date, to_date == (datetime.datetime(2015, 1, 20, 0, 0), datetime.datetime(2015, 4, 5, 0, 0))
回答2:
I suggest using something like sqlparse that already handles all the weird edge cases for you. It might be a better option in the long term, if you have to deal with more advanced cases.
EDIT: Why not just parse the date blocks as strings? Like so:
from pyparsing import CaselessKeyword, Word, Combine, Optional, alphas, nums
class QueryParser(object):
def parser(self, stmnt):
keywords = ["select", "from", "to", "show", "commits", "where",
"groupby", "order by", "and", "or"]
[select, _from, _to, show, commits, where, groupby, orderby, _and, _or]\
= [CaselessKeyword(word) for word in keywords]
user = Word(alphas + "." + alphas)
user2 = Combine(user + "'s")
startdate = Word(alphas + nums + "/")
enddate = Word(alphas + nums + "/")
bnf = (
(show | select) + (user | user2).setResultsName("user") +
(commits).setResultsName("stats") +
Optional(
_from + startdate.setResultsName("start") +
_to + enddate.setResultsName("end"))
)
a = bnf.parseString(stmnt)
return a
This gives me something like:
In [3]: q.parser("show fizi commits from 1/1/2010 to 11/2/2006")
Out[3]: (['show', 'fizi', 'commits', 'from', '1/1/2010', 'to', '11/2/2006'], {'start': [('1/1/2010', 4)], 'end': [('11/2/2006', 6)], 'stats': [('commits', 2)], 'user': [('fizi', 1)]})
Then you can use libraries like delorean or arrow that try to deal intelligently with the date part - or just use regular old dateutil.
回答3:
You can make the pyparsing parser very lenient in what it matches, and then have a parse action do the more rigorous value checking. This is especially easy if your date strings are all non-whitespace characters.
For example, say we wanted to parse for a month name, but for some reason did not want our parser expression to just do `oneOf('January February March ...etc.'). We could put in a placeholder that will just parse a Word group of characters up to the next non-eligible character (whitespace, or punctuation).
monthName = Word(alphas.upper(), alphas.lower())
So here our month starts with a capitalized letter, followed by 0 or more lowercase letters. Obviously this will match many non-month names, so we will add a parse action to do additional validation:
def validate_month(tokens):
import calendar
monthname = tokens[0]
print "check if %s is a valid month name" % monthname
if monthname not in calendar.month_name:
raise ParseException(monthname + " is not a valid month abbreviation")
monthName.setParseAction(validate_month)
If we do these two statements:
print monthName.parseString("January")
print monthName.parseString("Foo")
we get
check if January is a valid month name
['January']
check if Foo is a valid month name
Traceback (most recent call last):
File "dd.py", line 15, in <module>
print monthName.parseString("Foo")
File "c:\python27\lib\site-packages\pyparsing.py", line 1125, in parseString
raise exc
pyparsing.ParseException: Foo is not a valid month abbreviation (at char 0), (line:1, col:1)
(Once you are done testing, you can remove the print statement from the middle of the parse action - I just included it to show that it was being called during the parsing process.)
If you can get away with a space-delimited date format, then you could write your parser as:
date = Word(nums,nums+'/-')
and then you could accept 1/1/2001
, 29-10-1929
and so forth. Again, you will also match strings like 32237--/234//234/7
, obviously not a valid date, so you could write a validating parse action to check the string's validity. In the parse action, you could implement your own validating logic, or call out to an external library. (You will have to be wary of dates like '4/3/2013' if you are being tolerant of different locales, since there is variety in month-first vs. date-first options, and this string could easily mean April 3rd or March 4th.) You can also have the parse action do the actual conversion for you, so that when you process the parsed tokens, the string will be an actual Python datetime.
来源:https://stackoverflow.com/questions/28113532/build-a-simple-parser-that-is-able-to-parse-different-date-formats-using-pyparse