问题
For this simplified problem, I am trying to parse an input that looks like
foo bar
baz quux
woo
hoo xyzzy
glulx
into
[["foo", "bar", "baz", "quux", "woo"], ["hoo", "xyzzy", "glulx"]]
The code I've tried is as follows:
import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative
space :: Parser ()
space = L.space (void spaceChar) empty empty
item :: Parser () -> Parser String
item sp = L.lexeme sp $ some letterChar
items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' -> some (item sp')
items_ :: Parser [String]
items_ = items space
This works for one block of items
:
λ» parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
But as soon as I try to parse many items
, it fails on the first unindented line:
λ» parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n glulx"
4:1:
incorrect indentation (got 1, should be greater than 1)
or, with an even simpler input:
λ» parseTest (many items_) "a\nb"
2:1:
incorrect indentation (got 1, should be greater than 1)
回答1:
Megaparsec's author is here :-) One thing to remember when you work with
Megaparsec is that it's lexer module is really “low-level” on purpose. It
does not do anything you cannot build yourself, it doesn't lock you into any
particular “framework”. So basicly in your case you have space consumer
sp'
provided for you, but you should use it carefully because it will sure
fail when you have indentation level less or equal to indentation level of
start of the whole fold, that's how your fold ends, by the way.
To quote the docs:
Create a parser that supports line-folding. The first argument is used to consume white space between components of line fold, thus it must consume newlines in order to work properly. The second argument is a callback that receives custom space-consuming parser as argument. This parser should be used after separate components of line fold that can be put on different lines.
sc = L.space (void spaceChar) empty empty
myFold = L.lineFold sc $ \sc' -> do
L.symbol sc' "foo"
L.symbol sc' "bar"
L.symbol sc "baz" -- for the last symbol we use normal space consumer
Line fold cannot run indefinitely so you should expect it to fail with error message similar to what you have right now. To succeed, you should think about a way for it to finish. This is usually done via using of “normal” space consumer at the end of line fold:
space :: Parser ()
space = L.space (void spaceChar) empty empty
item :: Parser String
item = some letterChar
items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' ->
item `sepBy1` try sp' <* sp
items_ :: Parser [String]
items_ = items space
item `sepBy1` try sp'
runs till it fails and then sp
grabs the rest, so
next fold can be parsed.
λ> parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n glulx"
[["foo","bar","baz","quux","woo"],["hoo","xyzzy","glulx"]]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo\nxyzzy\n glulx"
[["foo","bar","baz","quux","woo"],["hoo"],["xyzzy","glulx"]]
来源:https://stackoverflow.com/questions/37256316/parsing-many-blocks-with-foldline