问题

New to PyParsing. I'm trying to work out how to parse the draw (and similar) attributes in xdot files. There are a number of items where the number of following elements is given as an integer at the start - sort of similar to NetStrings. I've looked at some of the sample code to deal with netstring like constructs, but it does not seem to be working for me.

Here are some samples:

Polygon with 3 points (the 3 after the P indicates the number of points following):
P 3 811 190 815 180 806 185 should parse to 'P', [[811, 190], [815, 180], [806, 185]]

Polygon with 2 points:
P 2 811 190 815 180 806 185 should parse to 'P', [[811, 190], [815, 180]] (with unparsed text at the end)

Pen fill colour (the 4 after the C indicates the number of characters after the '-' to consume):
C 4 -blue should parse to 'C', 'blue'

Updated Info:
I think I was misleading by putting the examples on their own lines, without more context. Here is a real example:

S 5 -solid S 15 -setlinewidth(1) c 5 -black C 5 -black P 3 690 181 680 179 687 187

See http://www.graphviz.org/doc/info/output.html#d:xdot for the actual spec.

Note that there could be significant spaces in the text fields - setlinewidth(1) above could be "abcd efgh hijk " and as long as it was exactly 15 characters, it should be linked with the 'S' tag. There should be exactly 7 numbers (the initial counter + 3 pairs) after the 'P' tag, and anything else should raise a parse error, since there could be more tags following (on the same line), but numbers by themselves are not valid.

Hopefully that makes things a little clearer.

回答1:

Well, this is what I came up with in the end, using scanString.

int_ = Word(nums).setParseAction(lambda t: int(t[0]))
float_ = Combine(Word(nums) + Optional('.' + ZeroOrMore(Word(nums, exact=1)))).setParseAction(lambda t: float(t[0]))
point = Group(int_ * 2 ).setParseAction(lambda t: tuple(t[0]))
ellipse = ((Literal('E') ^ 'e') + point + int_ + int_).setResultsName('ellipse')
n_points_start =  (Word('PpLBb', exact=1) + int_).setResultsName('n_points')
text_start = ((('T' + point + int_*3 ) ^ ('F' + float_ + int_) ^ (Word('CcS') + int_) ) + '-').setResultsName('text')
xdot_attr_parser = ellipse ^ n_points_start ^ text_start

def parse_xdot_extended_attributes(data):
    results = []
    while True:
        try:
            tokens, start, end = xdot_attr_parser.scanString(data, maxMatches = 1).next()
            data = data[end:]
            name = tokens.getName()
            if name == 'n_points':
                number_to_get = int(tokens[-1])
                points, start, end = (point * number_to_get).scanString(data, maxMatches = 1).next()
                result = tokens[:1]
                result.append(points[:])
                results.append(result)
                data = data[end:]
            elif name == 'text':
                number_to_get = int(tokens[-2])
                text, data = data[:number_to_get], data[number_to_get:]
                result = tokens[:-2]
                result.append(text)
                results.append(result)
            else:
                results.append(tokens)
        except StopIteration:
            break
    return results

回答2:

In response to OP's edit, the answer below is not complete anymore.

I'm going to try and get to the core of your question here and ignore the finer details. Hopefully it will put you on the right track to the rest of your grammar. Essentially you are asking, given the two lines:

P 3 811 190 815 180 806 185
P 2 811 190 815 180 806 185

how can you parse the data such that in the second line only two points are read? Personally, I would read all of the data and post-parse. You can make the job immeasurably easier for yourself if you name the results. For example:

from pyparsing import *

EOL = LineEnd().suppress()

number = Word(nums).setParseAction(lambda x: int(x[0]))
point_pair = Group(number + number)

poly_flag  = Group(Literal("P") + number("length"))("flag")
poly_type  = poly_flag + Group(OneOrMore(point_pair))("data")

xdot_line = Group(poly_type) + EOL
grammar   = OneOrMore(xdot_line)

Note that we have a data, flag and length name, this will come in handy later. Let's parse and process the string:

S = "P 3 811 190 815 180 806 185\nP 2 811 190 815 180 806 185\n"
P = grammar.parseString(S)

for line in P:
    L = line["flag"]["length"]  
    while len(line["data"]) > L: 
        line["data"].pop()

Giving the useful and structured result :

[['P', 3], [[811, 190], [815, 180], [806, 185]]]
[['P', 2], [[811, 190], [815, 180]]]

Extending the grammar

From here, you can independently build the pieces of the grammar one-by-one. Each time you add a new type, add it to xdot_line, i.e.

xdot_line = Group(poly_type | pen_fill_type) + EOL

来源：https://stackoverflow.com/questions/9898984/parsing-xdot-draw-attributes-with-pyparsing

标签

pyparsing

Parsing xdot draw attributes with pyparsing

问题

回答1:

回答2:

Extending the grammar