Cannot compute minimal length of a parser - uu-parsinglib in Haskell

问题

Lets see the code snippet:

pSegmentBegin p i   = pIndentExact i *> ((:) <$> p i <*> ((pEOL *> pSegment p i) <|> pure []))

if I change this code in my parser to:

pSegmentBegin p i   = do
    pIndentExact i
    ((:) <$> p i <*> ((pEOL *> pSegment p i) <|> pure []))

I've got an error:

canot compute minmal length of a parser due to occurrence of a moadic bind, use addLength to override

I thought the above parser should behave the same way. Why this error can occur?

EDIT

The above example is very simple (to simplify the question) and as noted below it is not necessary to use do notation here, but the real case I wanted it to use is as follows:

pSegmentBegin p i   = do
    j <- pIndentAtLast i
    (:) <$> p j <*> ((pEOL *> pSegments p j) <|> pure [])

I have noticed that adding "addLength 1" before the do statement solves the problem, but I'm unsure if its a correct solution:

pSegmentBegin p i   = addLength 2 $ do
    j <- pIndentAtLast i
    (:) <$> p j <*> ((pEOL *> pSegments p j) <|> pure [])

回答1:

As I have mentioned many times the monadic interface should be avoided whenever possible. let me try to explain why the applicative interface is to be preferred.

One of the distinctive features of my library is that it performs error correction by inserting or deleting problems. Of course we can take an umlimited look-ahead here but that would make the process VERY expensive. So we take only a limited lookahead of three steps.

Now suppose we have to insert an expression and one of the expression alternatives is:

expr := "if" expr "then" expr "else" expr

then we want to exclude this alternative since choosing this alternative would necessitate the insertion of another expression etc. So we perform an abstract interpretation of the alternatives and make sure that in case of a draw (i.e. equal costs for the limited lookahead) we take one of the non-recursive alternatives.

Unfortunately this scheme breaks down when one writes monadic parsers, since the length of the right hand side of the bind may depend on the result of the left-hand side. So we issue the error message, and ask some help from the programmer to indicate the number of tokens this alternative might consume. The actual value does not matter so much, as long as you do not provide a finite length for something which is recursive and may lead to infinite insertions. It is only used to select the shortest alternative in case of an insertion.

This abstract interpretation has some costs and if you write all your parsers in monadic style it is unavoidable that this analysis is repeated over an over again. so: DO NOT WRITE MONADIC STYLE PARSERS WHEN USING THIS LIBRARY IF THERE IS AN APPLICATIVE ALTERNATIVE.

回答2:

It's trying to statically analyze how much input needs to be read in order to optimize performance, but that kind of optimization requires a statically known parser structure—the kind that can be built by Applicatives since the parser effect cannot depend upon the parser value such what (>>=) does.

So that's what goes wrong—when you use do notation it translates to a Monadic bind which breaks the Applicative predictor. It'd be nice if the library only exposed one of the two interfaces so that this kind of error cannot happen, but instead there's some inconsistency if you use both interfaces together in the same parser.

Since this use of do is strictly unnecessary—you're not using the extra power the monadic interface gives you—it's probably better to just avoid it.