FParsec identifiers vs keywords

可紊 提交于 2019-12-01 08:31:36

I think, this problem is very simple. The answer is that you have to:

  1. Parse out an entire word ([a-z]+), lower case only;
  2. Check if it belongs to a dictionary; if so, return a keyword; otherwise, the parser will fall back;
  3. Parse identifier separately;

E.g. (just a hypothetical code, not tested):

let keyWordSet =
    System.Collections.Generic.HashSet<_>(
        [|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|]
    )
let pKeyword =
   (many1Satisfy isLower .>> nonAlphaNumeric) // [a-z]+
   >>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword")

let pContent =
    pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier

The code above will parse a keyword or an identifier twice. To fix it, alternatively, you may:

  1. Parse out an entire word ([a-z][A-Z]+[a-z][A-Z][0-9]+), e.g. everything alphanumeric;
  2. Check if it's a keyword or an identifier (lower case and belonging to a dictionary) and either
    1. Return a keyword
    2. Return an identifier

P.S. Don't forget to order "cheaper" parsers first, if it does not ruin the logic.

EugeneK

You can define a parser for whitespace and check if keyword or identifier is followed by it. For example some generic whitespace parser will look like

let pWhiteSpace = pLineComment <|> pMultilineComment <|> pSpaces

this will require at least one whitespace

let ws1 = skipMany1 pWhiteSpace

then if will look like

let pIf = pstring "if" .>> ws1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!