Parsing many blocks with foldLine

血红的双手。 提交于 2019-12-23 17:53:17

问题


For this simplified problem, I am trying to parse an input that looks like

foo bar
 baz quux 
 woo
hoo xyzzy 
  glulx

into

[["foo", "bar", "baz", "quux", "woo"], ["hoo", "xyzzy", "glulx"]]

The code I've tried is as follows:

import qualified Text.Megaparsec.Lexer as L
import Text.Megaparsec hiding (space)
import Text.Megaparsec.Char hiding (space)
import Text.Megaparsec.String
import Control.Monad (void)
import Control.Applicative

space :: Parser ()
space = L.space (void spaceChar) empty empty

item :: Parser () -> Parser String
item sp = L.lexeme sp $ some letterChar

items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' -> some (item sp')

items_ :: Parser [String]
items_ = items space

This works for one block of items:

λ» parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]

But as soon as I try to parse many items, it fails on the first unindented line:

λ» parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n  glulx"
4:1:
incorrect indentation (got 1, should be greater than 1)

or, with an even simpler input:

λ» parseTest (many items_) "a\nb"
2:1:
incorrect indentation (got 1, should be greater than 1)

回答1:


Megaparsec's author is here :-) One thing to remember when you work with Megaparsec is that it's lexer module is really “low-level” on purpose. It does not do anything you cannot build yourself, it doesn't lock you into any particular “framework”. So basicly in your case you have space consumer sp' provided for you, but you should use it carefully because it will sure fail when you have indentation level less or equal to indentation level of start of the whole fold, that's how your fold ends, by the way.

To quote the docs:

Create a parser that supports line-folding. The first argument is used to consume white space between components of line fold, thus it must consume newlines in order to work properly. The second argument is a callback that receives custom space-consuming parser as argument. This parser should be used after separate components of line fold that can be put on different lines.

sc = L.space (void spaceChar) empty empty

myFold = L.lineFold sc $ \sc' -> do
  L.symbol sc' "foo"
  L.symbol sc' "bar"
  L.symbol sc  "baz" -- for the last symbol we use normal space consumer

Line fold cannot run indefinitely so you should expect it to fail with error message similar to what you have right now. To succeed, you should think about a way for it to finish. This is usually done via using of “normal” space consumer at the end of line fold:

space :: Parser ()
space = L.space (void spaceChar) empty empty

item :: Parser String
item = some letterChar

items :: Parser () -> Parser [String]
items sp = L.lineFold sp $ \sp' ->
  item `sepBy1` try sp' <* sp

items_ :: Parser [String]
items_ = items space

item `sepBy1` try sp' runs till it fails and then sp grabs the rest, so next fold can be parsed.

λ> parseTest items_ "foo bar\n baz quux\n woo"
["foo","bar","baz","quux","woo"]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo xyzzy\n  glulx"
[["foo","bar","baz","quux","woo"],["hoo","xyzzy","glulx"]]
λ> parseTest (many items_) "foo bar\n baz quux\n woo\nhoo\nxyzzy\n  glulx"
[["foo","bar","baz","quux","woo"],["hoo"],["xyzzy","glulx"]]


来源:https://stackoverflow.com/questions/37256316/parsing-many-blocks-with-foldline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!