I\'m trying to use pyparsing
to parse function calls in the form:
f(x, y)
That\'s easy. But since it\'s a recursive-descent p
Nice catch on figuring out that identifier
was masking expression
in your definition of arg
. Here are some other tips on your parser:
x + ZeroOrMore(',' + x)
is a very common pattern in pyparsing parsers, so pyparsing includes a helper method delimitedList
which allows you to replace that expression with delimitedList(x)
. Actually, delimitedList
does one other thing - it suppresses the delimiting commas (or other delimiter if given using the optional delim
argument), based on the notion that the delimiters are useful at parsing time, but are just clutter tokens when trying to sift through the parsed data afterwards. So you can rewrite args as args = delimitedList(arg)
, and you will get just the args in a list, no commas to have to "step over".
You can use the Group
class to create actual structure in your parsed tokens. This will build your nesting hierarchy for you, without having to walk this list looking for '(' and ')' to tell you when you've gone down a level in the function nesting:
arg = Group(expression) | identifier | integer
expression << functor + Group(lparen + args + rparen)
Since your args are being Group
ed for you, you can further suppress the parens, since like the delimiting commas, they do their job during parsing, but with grouping of your tokens, they are no longer necessary:
lparen = Literal("(").suppress()
rparen = Literal(")").suppress()
I assume 'h()' is a valid function call, just no args. You can allow args to be optional using Optional
:
expression << functor + Group(lparen + Optional(args) + rparen)
Now you can parse "f(g(x), y, h())".
Welcome to pyparsing!
Paul's post helped a lot. Just for the reference of others, the same can be used to define for loops
as follows (simplified pseudo-parser here, to show the structure):
sep = Literal(';')
if_ = Keyword('if')
then_ = Keyword('then')
elif_ = Keyword('elif')
end_ = Keyword('end')
if_block = Forward()
do_block = Forward()
stmt = other | if_block
stmts = OneOrMore(stmt +sep)
case = Group(guard +then_ +stmts)
cases = case +OneOrMore(elif_ +case)
if_block << if_ +cases +end_
The definition of arg
should be arranged with the item that starts with another at the left, so it is matched preferentially:
arg = expression | identifier | integer