How to further improve error messages in Scala parser-combinator based parsers?

问题

I've coded a parser based on Scala parser combinators:

class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
    [...]
    lazy val document: PackratParser[AstNodeDocument] =
        ((procinst | element | comment | cdata | whitespace | text)*) ^^ {
            AstNodeDocument(_)
        }
    [...]
}
object SxmlParser {
    def parse(text: String): AstNodeDocument = {
        var ast = AstNodeDocument()
        val parser = new SxmlParser()
        val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
        result match {
            case parser.Success(x, _) => ast = x
            case parser.NoSuccess(err, next) => {
                tool.die("failed to parse SXML input " +
                    "(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
                    err + "\n" +
                    next.pos.longString)
            }
        }
        ast
    }
}

Usually the resulting parsing error messages are rather nice. But sometimes it becomes just

sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^

This happens if a quote characters is not closed and the parser reaches the EOT. What I would like to see here is (1) what production the parser was in when it expected the '"' (I've multiple ones) and (2) where in the input this production started parsing (which is an indicator where the opening quote is in the input). Does anybody know how I can improve the error messages and include more information about the actual internal parsing state when the error happens (perhaps something like a production rule stacktrace or whatever can be given reasonably here to better identify the error location). BTW, the above "line 32, column 1" is actually the EOT position and hence of no use here, of course.

回答1:

I don't know yet how to deal with (1), but I was also looking for (2) when I found this webpage:

https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624

I'm just copying the information:

A useful enhancement is to record the input position (line number and column number) of the significant tokens. To do this, you must do three things:

Make each output type extend scala.util.parsing.input.Positional

invoke the Parsers.positioned() combinator

Use a text source that records line and column positions

and

Finally, ensure that the source tracks positions. For streams, you can simply use scala.util.parsing.input.StreamReader; for Strings, use scala.util.parsing.input.CharArrayReader.

I'm currently playing with it so I'll try to add a simple example later

回答2:

In such cases you may use err, failure and ~! with production rules designed specifically to match the error.

来源：https://stackoverflow.com/questions/2906674/how-to-further-improve-error-messages-in-scala-parser-combinator-based-parsers

标签

scala

error-handling

parser-combinators