Extensive documentation on how to write a lexer for Pygments? [closed]

旧时模样 提交于 2019-12-03 06:31:40

If you just wanted to highlight the keywords, you'd start with this (replacing the keywords with your own list of Stata keywords):

class StataLexer(RegexLexer):

    name = 'Stata'
    aliases = ['stata']
    filenames = '*.stata'
    flags = re.MULTILINE | re.DOTALL

    tokens = {
       'root': [
           (r'(abstract|case|catch|class|do|else|extends|false|final|'
            r'finally|for|forSome|if|implicit|import|lazy|match|new|null|'
            r'object|override|package|private|protected|requires|return|'
            r'sealed|super|this|throw|trait|try|true|type|while|with|'
            r'yield)\b', Keyword),
       ],
   }

I think your problem is not that you don't know any Python, but that you don't have much experience with writing a lexer or understanding how a lexer works? Because this implementation is fairly straightforward.

Then, if you want to add more stuff, add an extra element to the root list, a two-element tuple, where the first element is a regular expression and the second element designates a syntactic class.

I attempted to write a pygments lexer (for BibTeX, which has a simple syntax) recently and agree with your assessment that the resources out there aren't very helpful for people unfamiliar with Python or general code parsing concepts.

What I found to be most helpful was the collection of lexers included with Pygments.

There is a file _mapping.py that lists all of the recognized language formats and links to the lexer object for each one. To construct my lexer, I tried to think of languages that had similar constructs to the ones I was handling and checked if I could tease out something useful. Some of the built-in lexers are more complex than I wanted, but others were helpful.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!