Correctly set priorities between rules and terminals in a grammar for lark

给你一囗甜甜゛ 提交于 2019-12-11 06:56:00

问题


This is my first time writing a parser using a grammar and a parser generator. I want to parse some kind of asn.1 format using the lark python module.

Here is an example of the data I'm trying to parse:

text = """
start_thing {
  literal {
    length 100,
    fuzz lim unk,
    seq-data gap {
      type fragment,
      linkage linked,
      linkage-evidence {
        {
          type unspecified
        }
      }
    }
  },
  loc int {
    from 0,
    to 1093,
    strand plus,
    id gi 384632836
  }
}
"""

The structure can contain all sorts of nodes, and I can't know in advance exactly what tags or combination of tags I should expect. However, there are some structures I want to be able to parse, like the "loc int {...}" part.

Here is the grammar I tried, where I used numbers to define priorities:

grammar = """\
thing: "start_thing" node
strand_info.5: "strand plus"
    | "strand minus"
locus_info.4: "loc int" "{" "from" INT "," "to" INT "," strand_info "," "id gi" INT "}"
nodes.1: node?
    | node ("," node)*
node.1: locus_info
    | TAGS? INT           -> intinfo
    | TAGS? "{" nodes "}" -> subnodes
    | TAGS                -> onlytags
TAGS.2: TAGWORD (WS TAGWORD)*
TAGWORD.3: ("_"|LETTER)("_"|"-"|LETTER|DIGIT)*
%import common.WS
%import common.LETTER
%import common.DIGIT
%import common.INT
%ignore WS
"""

I thought the priorities (in form of appended numbers) would be enough for the "loc int" things to be recognized in priority over a more general node kind, but this part seems to be parsed as a subnodes instead as a locus_info when I run make a parser for the above grammar and run it on the piece of text above:

parser = Lark(grammar, start="thing", ambiguity="explicit")
parsed = parser.parse(text)
print(parsed.pretty())

I obtain the following:

thing
  subnodes
    nodes
      subnodes
        literal
        nodes
          intinfo
            length
            100
          onlytags  fuzz lim unk
          subnodes
            seq-data gap
            nodes
              onlytags  type fragment
              onlytags  linkage linked
              subnodes
                linkage-evidence
                nodes
                  subnodes
                    nodes
                      onlytags  type unspecified
      subnodes
        loc int
        nodes
          intinfo
            from
            0
          intinfo
            to
            1093
          onlytags  strand plus
          intinfo
            id gi
            384632836

What am I doing wrong?

Note: I've seen a related question (Priority in grammar using Lark) but I do not see how to apply its answers to my problem. I' don't think that I am in a case where I can fully disambiguate my grammar (too many possible cases in the real data), and I didn't understand what the ambiguity="explicit" option was supposed to do.


Edit: inverting priorities

I tried inverting priorities, as follows:

grammar = """\
thing: "start_thing" node
strand_info.1: "strand plus"
    | "strand minus"
locus_info.2: "loc int" "{" "from" INT "," "to" INT "," strand_info "," "id gi" INT "}"
nodes.5: node?
    | node ("," node)*
node.5: locus_info
    | TAGS? INT           -> intinfo
    | TAGS? "{" nodes "}" -> subnodes
    | TAGS                -> onlytags
TAGS.4: TAGWORD (WS TAGWORD)*
TAGWORD.3: ("_"|LETTER)("_"|"-"|LETTER|DIGIT)*
%import common.WS
%import common.LETTER
%import common.DIGIT
%import common.INT
%ignore WS
"""
parser = Lark(grammar, start="thing", ambiguity="explicit")
parsed = parser.parse(text)
print(parsed.pretty())

However, the output is exactly the same. It is like if those priorities were ignored, or if there were actually no ambiguities, because my locus_info rule was not correctly specified.


回答1:


I think you should change your priorities. The "locus_info.4" is the most precise rule so it has to be first in priority.



来源:https://stackoverflow.com/questions/50928531/correctly-set-priorities-between-rules-and-terminals-in-a-grammar-for-lark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!