问题
How can I use parsec to parse all matched input in a string and discard the rest?
Example: I have a simple number parser, and I can find all the numbers if I know what separates them:
num :: Parser Int
num = read <$> many digit
parse (num `sepBy` space) "" "111 4 22"
But what if I don't know what is between the numbers?
"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
many anyChar
doesn't work as a separator, because it consumes everything.
So how can I get things that match an arbitrary parser surrounded by things I want to ignore?
EDIT: Note that in the real problem, my parser is more complicated:
optionTag :: Parser Fragment
optionTag = do
string "<option"
manyTill anyChar (string "value=")
n <- many1 digit
manyTill anyChar (char '>')
chapterPrefix
text <- many1 (noneOf "<>")
return $ Option (read n) text
where
chapterPrefix = many digit >> char '.' >> many space
回答1:
For an arbitrary parser myParser
, it's quite easy:
solution = many (let one = myParser <|> (anyChar >> one) in one)
It might be clearer to write it this way:
solution = many loop
where
loop = myParser <|> (anyChar >> loop)
Essentially, this defines a recursive parser (called loop
) that will continue searching for the first thing that can be parsed by myParser
. many
will simply search exhaustively until failure, ie: EOF.
回答2:
You can use
many ( noneOf "0123456789")
i'm not sure about "noneOf" and "digit" types but you can give e try also to
many $ noneOf digit
回答3:
To find the item in the string, the item is either at the start of the string, or consume one character and look for the item in the now-shorter string. If the item isn't right at the start of the string, you'll need to un-consume the characters used while looking for it, so you'll need a try
block.
hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>
This code looks for just one occurrence of item
in the string.
(AJFarmar almost has it.)
回答4:
The replace-megaparsec package allows you to split up a string into sections which match your pattern and sections which don't match by using the sepCap parser combinator.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
let num :: Parsec Void String Int
num = read <$> many digitChar
>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]
来源:https://stackoverflow.com/questions/29549435/parsec-how-to-find-matches-within-a-string