Implementing parser for markdown-like language

前端 未结 1 1406
旧时难觅i
旧时难觅i 2021-02-06 03:16

I have markup language which is similar to markdown and the one used by SO.

Legacy parser was based on regexes and was complete nightmare to maintain, so I\'ve come up w

相关标签:
1条回答
  • 2021-02-06 03:37

    If one thing includes another, then normally you treat them as separate tokens and then nest them in the grammar. Lepl (http://www.acooke.org/lepl which I wrote) and PyParsing (which is probably the most popular pure-Python parser) both allow you to nest things recursively.

    So in Lepl you could write code something like:

    # these are tokens (defined as regexps)
    stg_marker = Token(r'\*\*')
    emp_marker = Token(r'\*') # tokens are longest match, so strong is preferred if possible
    spo_marker = Token(r'%%')
    ....
    # grammar rules combine tokens
    contents = Delayed() # this will be defined later and lets us recurse
    strong = stg_marker + contents + stg_marker
    emphasis = emp_marker + contents + emp_marker
    spoiler = spo_marker + contents + spo_marker
    other_stuff = .....
    contents += strong | emphasis | spoiler | other_stuff # this defines contents recursively
    

    Then you can see, I hope, how contents will match nested use of strong, emphasis, etc.

    There's much more than this to do for your final solution, and efficiency could be an issue in any pure-Python parser (There are some parsers that are implemented in C but callable from Python. These will be faster, but may be trickier to use; I can't recommend any because I haven't used them).

    0 讨论(0)
提交回复
热议问题