Using Parsec to parse regular expressions

浪子不回头ぞ 提交于 2019-12-03 07:45:19

You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!

You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.

Here's my attempt (I threw in |, +, and ? for good measure):

import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr

data Term = Literal Char
          | Sequence [Term]
          | Repeat (Int, Maybe Int) Term
          | Choice [Term]
  deriving ( Show )

term :: Parser Term
term = buildExpressionParser ops atom where

  ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
          , Postfix (Repeat (1, Nothing) <$ char '+')
          , Postfix (Repeat (0, Just 1)  <$ char '?')
          ]
        , [ Infix (return sequence) AssocRight
          ]
        , [ Infix (choice <$ char '|') AssocRight
          ]
        ]

  atom = msum [ Literal <$> lit
              , parens term
              ]

  lit = noneOf "*+?|()"
  sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
  choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
  parens = between (char '(') (char ')')

  seqTerms (Sequence ts) = ts
  seqTerms t = [t]

  choiceTerms (Choice ts) = ts
  choiceTerms t = [t]

main = parseTest term "he(llo)*|wor+ld?"

Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:

lit :: Parser (Char, Maybe Char)
lit = do
  c <- noneOf "*"
  s <- optionMaybe $ char '*'
  return (c, s)

Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:

import Control.Applicative ((<$>))

data Term = Literal Char
          | Sequence [Term]
          | Star Term

expr :: Parser Term
expr = Sequence <$> many term

term :: Parser Term
term = do
  c <- lit
  s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
  return $ if isNothing s
    then Literal c
    else Star $ Literal c

Maybe a more experienced Haskeller will come along with a better solution.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!