Case-insensitive keyword matching

扶醉桌前 提交于 2019-12-07 13:44:43

问题


I'm writing a grammar for parsing a computer language, that can be used with Parse::Eyapp. This is a Perl package that simplifies writing parsers for regular languages. It is similar to yacc and other LALR parser generators, but has some useful extensions, like defining tokens in terms of regular expressions.

The language I want to parse uses keywords to denote sections and describe control flow. It also supports identifiers that serve as placeholders for data. An identifier can never have the same name as a keyword.

Now, here comes the tricky part: I need to separate keywords from identifiers, but they may look similar, so I need a regular expression pattern that matches an identifier case-insensitively, and nothing else.

The solution I came up with is the following:

  1. Each keyword is identified by a token of the following form: /((?i)keyword)(?!\w)/
    • (?i) will apply case-insensitive matching for the following subpattern
    • (?!\w) will not accept any word characters (a-z, 0-9, etc.) after the keyword
    • those characters will not be part of the match
  2. Keywords that are the same as the beginning of another keyword are listed after the longer keyword, so they match first
  3. The token for matching identifiers comes last so it will only match when no keyword is recognized

The token definitions and part of the grammar I came up with work well so far, but there is still a lot to do. However, that is not my question.

What I wanted to ask is, am I on the right track here; are there better, simpler regular expressions for matching those keywords? Should I stop and use a different approach for language parsing altogether?

The idea of using the tokenizer to match whole strings instead of single characters came from the Parse::Eyapp documentation, by the way. I started with a character-by-character grammar first, but that approach wasn't very elegant and seems to contradict the flexible nature of the parser generator. It was very cumbersome to write, too.


回答1:


If you would like to parse a language, Marpa maybe much better suited for you. Here's a tutorial. You could also use regexp grammars.



来源:https://stackoverflow.com/questions/17407918/case-insensitive-keyword-matching

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!