发表新帖

发表新帖

Implementing parser for markdown-like language

前端未结

关注

 1  1415

I have markup language which is similar to markdown and the one used by SO.

Legacy parser was based on regexes and was complete nightmare to maintain, so I\'ve come up w

相关标签:

1条回答

天命终不由人

2021-02-06 03:37
If one thing includes another, then normally you treat them as separate tokens and then nest them in the grammar. Lepl (http://www.acooke.org/lepl which I wrote) and PyParsing (which is probably the most popular pure-Python parser) both allow you to nest things recursively.

So in Lepl you could write code something like:
```
# these are tokens (defined as regexps)
stg_marker = Token(r'\*\*')
emp_marker = Token(r'\*') # tokens are longest match, so strong is preferred if possible
spo_marker = Token(r'%%')
....
# grammar rules combine tokens
contents = Delayed() # this will be defined later and lets us recurse
strong = stg_marker + contents + stg_marker
emphasis = emp_marker + contents + emp_marker
spoiler = spo_marker + contents + spo_marker
other_stuff = .....
contents += strong | emphasis | spoiler | other_stuff # this defines contents recursively
```
Then you can see, I hope, how contents will match nested use of strong, emphasis, etc.

There's much more than this to do for your final solution, and efficiency could be an issue in any pure-Python parser (There are some parsers that are implemented in C but callable from Python. These will be faster, but may be trickier to use; I can't recommend any because I haven't used them).
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题