Parsing non binary operators with Parsec

半城伤御伤魂 提交于 2021-02-08 03:28:01

问题


Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.

Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?

For example, the following expression should be parsed into the tree

(a + b) + c + d * e + f


回答1:


Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.

This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:

module Lib where

import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens

data Expr =
    Identifier String
  | Multiply Expr Expr
  | Add Expr Expr

instance Show Expr where
  show (Identifier s) = s
  show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
  show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"

-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell

-- Infix parser.
infix_ operator func =
  Expr.Infix (reserved operator >> return func) Expr.AssocLeft

parser =
  Expr.buildExpressionParser table term <?> "expression"
  where
    table = [[infix_ "*" Multiply], [infix_ "+" Add]]

term =
  parens parser
  <|> (Identifier <$> identifier)
  <?> "term"

Running this in GHCi:

λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)

There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:

data Expr' =
    Identifier' String
  | Add' [Expr']
  | Multiply' [Expr']
  deriving (Show)

collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e@(Add l r) f =
  collect l f ++ collect r f
collect e@(Multiply l r) f =
  collect l f ++ collect r f

isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False

isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False

optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e@(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e@(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))

I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.



来源:https://stackoverflow.com/questions/33987104/parsing-non-binary-operators-with-parsec

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!