context in pyparsing parse actions besides globals

问题

I'd like to be able to parse two (or any number) of expressions, each with their own set of variable definitions or other context.

There doesn't seem to be an obvious way to associate a context with a particular invocation of pyparsing.ParseExpression.parseString(). The most natural way seems to be to use an instancemethod of some class as the parse actions. The problem with this approach is that the grammar must be redefined for each parse context (for instance, in the class's __init__), which seems terribly inefficient.

Using pyparsing.ParseExpression.copy() on the rules doesn't help; the individual expressions get cloned alright, but the sub-expressions they are composed from don't get updated in any obvious way, and so none of the parse actions of any nested expression gets invoked.

The only other way I can think of to get this effect would be to define a grammar that returns a context-less abstract parse tree and then processing it in a second step. This seems awkward even for simple grammars: it would be nice to just raise an exception the moment an unrecognized name is used, and it still won't parse languages like C which actually require context about what came before to know which rule matched.

Is there another way of injecting context (without using a global variable, of course) into the parse actions of pyparsing expressions?

回答1:

I don't know if this will necessarily answer your question, but it is one approach to customizing a parser to a context:

from pyparsing import Word, alphas, alphanums, nums, oneOf, ParseFatalException

var = Word(alphas+'_', alphanums+'_').setName("identifier")
integer = Word(nums).setName("integer").setParseAction(lambda t:int(t[0]))
operand = integer | var

operator = oneOf("+ - * /")
ops = {'+' : lambda a,b:a+b,
       '-' : lambda a,b:a-b,
       '*' : lambda a,b:a*b,
       '/' : lambda a,b:a/b if b else "inf",
        }

binop = operand + operator + operand

# add parse action that evaluates the binary operator by passing 
# the two operands to the appropriate binary function defined in ops
binop.setParseAction(lambda t: ops[t[1]](t[0],t[2]))

# closure to return a context-specific parse action
def make_var_parseAction(context):
    def pa(s,l,t):
        varname = t[0]
        try:
            return context[varname]
        except KeyError:
            raise ParseFatalException("invalid variable '%s'" % varname)
    return pa

def eval_binop(e, **kwargs):
    var.setParseAction(make_var_parseAction(kwargs))
    try:
        print binop.parseString(e)[0]
    except Exception as pe:
        print pe

eval_binop("m*x", m=100, x=12, b=5)
eval_binop("z*x", m=100, x=12, b=5)

Prints

1200
invalid variable 'z' (at char 0), (line:1, col:1)

回答2:

A bit late, but googling pyparsing reentrancy shows this topic, so my answer.
I've solved the issue with parser instance reusing/reentrancy by attaching the context to the string being parsed. You subclass str, put your context in an attribute of the new str class, pass an instance of it to pyparsing and get the context back in an action.

Python 2.7:

from pyparsing import LineStart, LineEnd, Word, alphas, Optional, Regex, Keyword, OneOrMore

# subclass str; note that unicode is not handled
class SpecStr(str):
    context = None  # will be set in spec_string() below
    # override as pyparsing calls str.expandtabs by default
    def expandtabs(self, tabs=8):
        ret = type(self)(super(SpecStr, self).expandtabs(tabs))
        ret.context = self.context
        return ret    

# set context here rather than in the constructor
# to avoid messing with str.__new__ and super()
def spec_string(s, context):
    ret = SpecStr(s)
    ret.context = context
    return ret    

class Actor(object):
    def __init__(self):
        self.namespace = {}

    def pair_parsed(self, instring, loc, tok):
        self.namespace[tok.key] = tok.value

    def include_parsed(self, instring, loc, tok):
        # doc = open(tok.filename.strip()).read()  # would use this line in real life
        doc = included_doc  # included_doc is defined below
        parse(doc, self)  # <<<<< recursion

def make_parser(actor_type):
    def make_action(fun):  # expects fun to be an unbound method of Actor
        def action(instring, loc, tok):
            if isinstance(instring, SpecStr):
                return fun(instring.context, instring, loc, tok)
            return None  # None as a result of parse actions means 
            # the tokens has not been changed

        return action

    # Sample grammar: a sequence of lines, 
    # each line is either 'key=value' pair or '#include filename'
    Ident = Word(alphas)
    RestOfLine = Regex('.*')
    Pair = (Ident('key') + '=' +
            RestOfLine('value')).setParseAction(make_action(actor_type.pair_parsed))
    Include = (Keyword('#include') +
               RestOfLine('filename')).setParseAction(make_action(actor_type.include_parsed))
    Line = (LineStart() + Optional(Pair | Include) + LineEnd())
    Document = OneOrMore(Line)
    return Document

Parser = make_parser(Actor)  

def parse(instring, actor=None):
    if actor is not None:
        instring = spec_string(instring, actor)
    return Parser.parseString(instring)


included_doc = 'parrot=dead'
main_doc = """\
#include included_doc
ham = None
spam = ham"""

# parsing without context is ok
print 'parsed data:', parse(main_doc)

actor = Actor()
parse(main_doc, actor)
print 'resulting namespace:', actor.namespace

yields

['#include', 'included_doc', '\n', 'ham', '=', 'None', '\n', 'spam', '=', 'ham']
{'ham': 'None', 'parrot': 'dead', 'spam': 'ham'}

This approach makes the Parser itself perfectly reusable and reentrant. The pyparsing internals are generally reentrant too, as long as you don't touch ParserElement's static fields. The only drawback is that pyparsing resets its packrat cache on each call to parseString, but this can be resolved by overriding SpecStr.__hash__ (to make it hashable like object, not str) and some monkeypatching. On my dataset this is not an issue at all as the performance hit is negligible and this even favors memory usage.

回答3:

Howabout letting the parse actions be instancemethods like you say, but just not reinstantiate the class? Instead when you want to parse another translation unit reset the context in the same parser object.

Something like this:

from pyparsing import Keyword, Word, OneOrMore, alphas, nums

class Parser:
    def __init__(self):
        ident = Word(alphas)
        identval = Word(alphas).setParseAction(self.identval_act)
        numlit = Word(nums).setParseAction(self.numlit_act)
        expr = identval | numlit
        letstmt = (Keyword("let") + ident + expr).setParseAction(self.letstmt_act)
        printstmt = (Keyword("print") + expr).setParseAction(self.printstmt_act)
        program = OneOrMore(letstmt | printstmt)

        self.symtab = {}
        self.grammar = program

    def identval_act(self, (ident,)):
        return self.symtab[ident]
    def numlit_act(self, (numlit,)):
        return int(numlit)
    def letstmt_act(self, (_, ident, val)):
        self.symtab[ident] = val
    def printstmt_act(self, (_, expr)):
        print expr

    def reset(self):
        self.symtab = {}

    def parse(self, s):
        self.grammar.parseString(s)

P = Parser()
P.parse("""let foo 10
print foo
let bar foo
print bar
""")

print P.symtab
P.parse("print foo") # context is kept.

P.reset()
P.parse("print foo") # but here it is reset and this fails

In this example "symtab" is your context.

Ofcouse this fails badly if you tries to do parallel parsing in different threads, but I don't see how that could work in a sane way with shared parse actions.

回答4:

I ran into this exact limitation, and used threading.local() to attach parser context information as thread-local storage. In my case I keep a stack of parsed terms that is pushed and popped inside the parse action functions, but obviously you can also use it to store a reference to a class instance or whatever.

It looks somewhat like this:

import threading

__tls = threading.local()

def parse_term(t):
  __tls.stack.append(convert_term(t))

def parse_concatenation(t):
  rhs = __tls.stack.pop()
  lhs = __tls.stack.pop()

  __tls.stack.append(convert_concatenation(t, lhs, rhs)

# parse a string s using grammar EXPR, that has parse actions parse_term and
# parse_concatenation for the rules that parse expression terms and concatenations
def parse(s):
  __tls.stack = []

  parse_result = EXPR.parseString(s)

  return __tls.stack.pop()

In my case all of the thread-local storage stuff, setting up the stack, the parse actions and the grammar itself are pushed outside of the public API, so from the outside nobody can see what's going on or mess with it. There simply is a parse method somewhere in the API that takes a string and returns a parsed, converted representation of the query, that is thread-safe and doesn't have to re-create the grammar for every parse call.

来源：https://stackoverflow.com/questions/8694849/context-in-pyparsing-parse-actions-besides-globals

标签

python

pyparsing